GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
75 stars 29 forks source link

Should document parameter soureUrl be a ResourceReference #112

Closed greenwoodma closed 3 years ago

greenwoodma commented 4 years ago

In most cases we've changed URL params to be ResourceReferences allowing things to be loaded from within plugins. Should we do the same for documents to allow them to be loaded from within a plugin? Useful if you want to put example documents inside a plugin for instance

johann-petrak commented 4 years ago

Cannot see any potential problems and could sure be useful in some situations!

ianroberts commented 4 years ago

I’m not sure that’ll work, given getSourceUrl() is part of the Document interface and is used in a number of places both in core and in plugins to build relative URLs. Also GCP and a few other similar tools rely on being able to build documents from a java.net.URL with a customised URLStreamHandler and won’t work if the URL gets converted to a URI and back again by ResourceReference

greenwoodma commented 4 years ago

ah yes, I forgot that getSourceUrl() was begin abused as both a bean setter and as part of the Document interface.

I suppose one option would be add a resourceReference param as part of the stringContent and sourceUrl disjunction so it just becomes another way of specifying a location -- assuming disjunctions can have have more than two values?

johann-petrak commented 4 years ago

Also forgot about that. But the OR solution should work, can have as many as you like. Once the resourceReference gets resolved, there is then an actual URL that can get returned by getSourceUrl, no?

greenwoodma commented 4 years ago

yes, we could certainly return the URL via that route if we wanted to, which would allow any other code which assumed it returned a URL to work, even if the document was created from a ResourceReference instance

ianroberts commented 4 years ago

Once the resourceReference gets resolved, there is then an actual URL that can get returned by getSourceUrl, no?

You could, the only wrinkle would be that if you saved an xgapp “with corpus” then it’d include the resolved jar:file:/Users/ian/.m2/repository/... url strings in the document params - there would have to be logic to prefer the ResourceReference over the URL at init if both are non-null.

greenwoodma commented 4 years ago

Hmmm, this is looking like a bit of a can of worms, the kind where I'm going, for the time being, to turn around and walk away from it.

johann-petrak commented 4 years ago

Maybe not worth the effort, we could make this WONTFIX until we really urgently need to access a document from a plugin. Which might be never.

greenwoodma commented 3 years ago

Having thought about this some more, I think the logical solution is WONTFIX. There isn't a good use case for loading documents out of a plugin and the hassle allowing this would cause just isn't worth it. If someone really feels they need this at a later date then they con either re-open this or file a new issue, but for now....