DigiClass / ProsopoCogito

A branch of the Recogito annotation tool for developing SNAP-compatible person-search
Other
0 stars 0 forks source link

Add free URI field to both place and person Recogito search #5

Open gabrielbodard opened 6 years ago

rsimon commented 6 years ago

For Person, no specific extension is needed. I checked & confirmed that the standard uri field in the annotation body can just take any URI, and Recogito will store it happily.

For Places, the current implementation strictly requires place URIs that are known in the built-in gazetteer index, at the time the annotation is created. This kind of integrity enforcement would need to be relaxed if we want to allow this.

Warning, however: such "dangling links" would have a few consequences further down the line, i.e. a somewhat more thorough analysis of pros/cons will be needed at some point.

rsimon commented 6 years ago

Looked into this further & think it makes sense to treat "known" URIs differently from the others in the model. It would be (much) easier to query them separately meaning that

gabrielbodard commented 6 years ago

That's fair enough—and I'm all in favour of granularity of fields allowing separate querying down the line—but, and I think this is important, it is important that the two URI fields also be able to be queried together. I see several possible cases for this:

In other words, I'm wondering if the more useful distinction isn't between known and unknown (although that has value for parsing/visualisation), but between interface-selected and manually entered.

rsimon commented 6 years ago

Hm... good point. Probably doesn't make modeling easier though ;-) Can we discuss the use cases a bit more?

  • Free entry of URIs that are known to Pelagios (or that become known to it later)

It totally makes sense to be able to add a URI directly, irrespective of whether it's known or not, I agree. But would it be essential to know whether it's been manually added or not? (Vs. manually searched in the gazetteer, for example?) After all, there is still the "confirmed" vs. "non-confirmed" flag, if the point was to distinguish between NER annotations and user-provided ones. In addition, automated NER that has not been touched by a human user is already identifiable because it has no "created by" information attached to it.

  • preserving the difference between Recogito-selected and user-pasted URIs

Would the key use case for this to benchmark the NER?

  • comparison of URIs for machine-assisted disambiguation/coreference or reasoning
  • analysis of annotations with multiple URIs for assisted gazetteer alignment

Can you elaborate on these two a bit more? E.g. give examples?

rsimon commented 6 years ago

Hi @gabrielbodard (and CC-ing @thegsi),

just a quick heads-up that I'm picking up work on this again. Time (as always) is limited, but I'd least like to spend a couple of days building a prototype branch of Recogito with a changed internal data model, where the "URIs have to be known & indexed" constraint is removed.

I think the code/schema changes may not be so bad after all. I'm expecting a performance hit on some rather essential features (map view, data exports) & don't yet know how bad it will be. Also, transforming the 500k+ existing annotations in our live instance to the a new format will be a bit of open-heart surgery, but let's worry about that when we get there ;-)

If it works, however, I think we would not only be able to support the feature as such; but it would also simplify Recogito's internal structure and potentially make it a lot easier to plug in external knowledge bases and URI sources. Hence, definitely a goal worth pursuing. I'll keep you posted on the progress!

P.S.: I'm also documenting stuff at https://github.com/pelagios/recogito2/issues/413

thegsi commented 6 years ago

@rsimon Sounds great. Probably Scala work? Do keep me updated here and/or email about progress and if you need some Javascript work.

rsimon commented 6 years ago

The bulk of the backend work (yes, all Scala) is now done. Still needs testing and probably a bit bugfixing here and there. But overall it's looking good. Because for various reasons, however, I won't be deploying this to the live instance until mid-February, and (a second update) beginning of March. (Most importantly, my institution is moving office and will take down the server for a week or so. Therefore I'll need to move everything to a rented VM and then back after everything is done. Planning to combine the move-related downtimes with the system upgrades.)

rsimon commented 6 years ago

PS.: I am now moving back to mostly frontend/JavaScript work now. E.g. options to change the map colouring based on different properties (tags, annotation status etc.):

https://github.com/pelagios/recogito2/tree/master/app/assets/javascripts/document/map

and enhancing the gazetteer search dialog, e.g. adding options to filter by gazetteer etc.:

https://github.com/pelagios/recogito2/tree/master/app/assets/javascripts/document/annotation/common/georesolution

Also, of course, a "georesolution-panel-alternative" for searching person datasets would be highly intresting, as discussed.