future of lappsgrid at_types

keighrim commented 1 year ago

extended discussion from https://github.com/clamsproject/app-dbpedia-spotlight-wrapper/issues/2 also some related discussion can be found in #86.

The problem with the lapps vocabulary is that it wasn't really designed for versioning from the beginning, and even we do start properly versioning it (which I don't think is doable in any viable future that suits the Mellon grant timeline, given current status of lappsgrid infrastructure, and funding situation), we don't know how properly integrate lapps at_type versions into MMIF versions (partly due to we no longer use LD-contexts)

My assessment of possible direction from here is

"fork" lapps vocab and merge into clams vocab
leave lapps vocab as underspecified, and use unrestricted (or undefined) arbitrary field names (originally for an addition of NEL, but we need others for other NLP concepts that can fall under some of lapps at_types)
fix and maintain lapps vocab under vocab.lappsgrid.org (with or without versioning) with clams funding (or other available resources)

I think

option 3 is a non-starter as we're running short on the resources,
option 2 is the easiest but not sustainable in the long run.

So that leaves us option 1.

But I'd like to here more about other possible alternatives.

marcverhagen commented 1 year ago

The problem with the lapps vocabulary is that it wasn't really designed for versioning from the beginning

I don't see where this statement comes from because it was designed to be versioned and it was pretty similar to how we versioned CLAMS.

Having said that, the LAPPS versioning is clearly very different from our current versioning. I agree option 3 is not a good one without independent LAPPS funding. Option 2 works for now and is also something that we have used for CLAMS. In the long run some version of option 1 is probably the way to go.

Not that many types need to be copied (at most about a dozen) so it will not explode our vocabulary. For each of the copied types we can use the similarTo property to refer to the old LAPPS type. We do not have to do all of them at once and we can always decide to for now only do the ones we need, which could be just the NamedEntity type.

Paragraph, Sentence, NounChunk, VerbChunk, NamedEntity and Token are all natural subtypes of the CLAMS Span type. Markable does not need to be copied since it is really the same as Span. The only reason we have it in the LAPPS vocabulary is that it is a common term amongst linguists.
The four LAPPS relation types all fit under the CLAMS Relation.
Which leaves Coreference, PhraseStructure and DependencyStructure. I was never too happy with them being direct subtypes of Annotation, but I have no better solution.

keighrim commented 1 year ago

I don't see where this statement comes from because it was designed to be versioned and it was pretty similar to how we versioned CLAMS.

Hmmm, not that I remember... My statement above is coming from

the fact that there was never a vocab 1.0.0, and support for the versioning was added Apr 2017, my fourth year into the project.
- that's probably 6th or 7th year of the project, so that period was the "versionless" era for lapps vocab, and once we introduced the versioning, the first number the vocab got was 1.1.0, not 1.0.0.
the fact that the SDKs hasn't and still don't support versioned at_types (Java dev Java master, Python).

To my understanding, the lapps vocab was never started with the versioning in mind. The versioning we added (circa 2017) was ad-hoc and even since it wasn't really picked up by other components in the framework. I think that was one big lesson we learned from our mistake and led us to start CLAMS with consideration of MMIF spec versioning much more seriously from the very beginning.

I agree that for now we can stay in the option 2 area, maybe until we see a couple more changes we want to add to the LAPPS vocab types. However, once we decide to move to option 1 direction, I think it'd be better off to copy the whole thing** at once, and never consider a re-usable migration process (as a piece of code or as a manual labor). Even with all the lapps types, I don't think the size of CLAMS vocab will explode.

** (regarding the back-references to the lapps URLs based on thesimilarTo information, I think we can do something like a.k.a. links at the top of vocab type web pages)

Suggested re-position of the lapps types into the clams tree all agreeable. For the graph structures (PhraseS, DepS), I don't have a better solution either. But for the coreference, I'm thinking we can expend the coreferences into some kind of alignment between video objects (bounding boxes) and textual mentions (e.g. faces to names). It's still a very vague idea, and I'll try to polish it out into a better formed proposal.

marcverhagen commented 1 year ago

@keighrim I see, my memory was somewhat different. But you were involved in LAPPS since 2014? That is also longer than I remember.

In any case, how good or bad the LAPPS versioning was is somewhat immaterial. The current state has versioning similar to what we used to have in CLAMS and it does make sense to assimilate LAPPS types so we have consistent versioning across the types we are using. I have no strong feelings on whether we add elements one by one or all at the same time.

marcverhagen commented 1 year ago

But for the coreference, I'm thinking we can expend the coreferences into some kind of alignment between video objects (bounding boxes) and textual mentions (e.g. faces to names). It's still a very vague idea, and I'll try to polish it out into a better formed proposal.

Interesting, coreference and alignment do indeed have some things in common. At some point I was wondering about using some kind of a grounding mechanism where video objects and text mentions would map to the same entity in a database. That went nowhere but I am curious to hear where you are going here.

keighrim commented 1 week ago

LAPPS vocab recently started showing even more signs of degradation when the website became inaccessible via modern web browsers with sane security features, due (probably) to the TLS certificate expiration.

clamsproject / mmif

future of lappsgrid at_types #202