GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
76 stars 29 forks source link

Improve save as Inline XML and possibly factor into a plugin. Also update documentation! #10

Closed johann-petrak closed 5 years ago

johann-petrak commented 7 years ago

The default behaviour of this right now is a bit odd: since the rootElement parameter is optional and empty by default, unless the original document was one with a root XML-like element (e.g. HTML), the document that gets created is non-valid XML. At the least there should be a tooltip warning about this and proper documentation. Another possibility would be to add a parameter (add rootElement unless already present) to add a root element always, and use some default element name there if none is entered for the rootElement parameter ("Document" springs to mind).

greenwoodma commented 7 years ago

I think the main issue is that this is badly named. If I recall correctly the underlying code dates back to a time where we really only handled XML and HTML documents. The intention was to allow people to save their documents back into the original format but with extra XML/HTML elements. In this situation there is no need for a root element as part of the export as it will always be in the original document (which annotation set is a good question depending on use of annotation set transfer etc.).

The problem is that we have moved well beyond just HTML and XML files so this become "inline XML" instead of "save preserving format" (which I believe was it's original name). I think the default settings make perfect sense given the expected use case. It just falls apart when used on random document formats that aren't HTML/XML style.

I think the best solution would be to document it properly but move it into a plugin so it isn't a default output option.

johann-petrak commented 7 years ago

Documenting it would be the number 1 priority of course. But I think the default should never generate something that is invalid XML. The documentation could point all what you said out and describe how to change the parameters from the default so that one can avoid having a root element, if necessary, but I am, in general, against anything where we create a document by default which cannot even be read back into GATE without an exception!

greenwoodma commented 7 years ago

In principal I agree that producing files we can't open seems odd, but I think breaking the behaviour on something that has been around so long is actually worse. Actually I don't think the root element issue is the worst thing about this format; the fact that it silently throws away annotations that partially overlap leading to loss of information is far more worrying. At least the missing root element is trivial to fix with 30 seconds and a text editor.

johann-petrak commented 7 years ago

This may or may not be related to what to do about the "Flexible Exporter" PR in the Tools plugin.

greenwoodma commented 7 years ago

What were you planning on doing with the flexible exporter? Personally I've never liked the way that is part of a pipeline, I think exporting results should be a separate step

greenwoodma commented 5 years ago

Closed by https://github.com/GateNLP/userguide/commit/9f01eadd1c3a28c74af71e3539b3b4daf714f353