GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
75 stars 29 forks source link

Can we try and maintain feature type when edited in the UI #80

Closed greenwoodma closed 5 years ago

greenwoodma commented 5 years ago

See if we can reduce the chance of accidentally changing the type of a feature value in the UI

Idea from Kalina: can we have the UI check the type of the previous value and try to coerce the newly entered value to the same type? At least for Long, Integer, Float and Double if nothing else? Or at least pop up a warning when someone is about to accidentally change the type to be sure that's what they really wanted to do?

Originally posted by @ianroberts in https://github.com/GateNLP/gate-core/issues/72#issuecomment-488251239

greenwoodma commented 5 years ago

I think in most cases, but not all, classes that we are likely to use as feature values that also have a single argument string constructor, will be able to correctly parse their own toString() output. As such as long as the new value should be the same type (i.e. the user isn't trying to change the type) then the coercion should work.

We can check that the result is correct by checking that the toString() of the new value is the same as the value the user entered, and that we don't get an exception from the constructor (I could imagine that for complex objects, dates say, you might be able to edit the string in a way that would cause the constructor to fail to parse). If we fail either test then drop down to a String instance as currently and log a warning to the message pane.

ianroberts commented 5 years ago

I don't know how general we need to be but I feel it'd be good to do this for at least numbers and booleans.

The other question is what happens with annotations that don't have a value for the given feature already. If the type has a schema loaded then we can use that to find out what type the feature is supposed to be but if it doesn't and the user types in "5" as the value, do we store that as a string or a Long? Do we look at other annotations in the same set to see what type they have for the same feature and try to infer a "virtual schema"? What if there aren't any other annotations of that type already (if we're doing manual annotation starting from scratch)? Are we ultimately going to need a way for a user to specify the type of a feature when they enter it rather than assuming string?

greenwoodma commented 5 years ago

Yes, it gets messy quickly, doesn't it. I have a feeling it's one of those issues that's going to get "fixed" in small pieces over a fairly long period of time.

I've written (but not tested) two lines of code which should solve the main issue in that editing existing values should now maintain the type where possible.

When we do schema driven editing I'm not actually sure what happens for numbers (booleans I believe are a true/false button), but it might be worth checking and implementing more handlers for the schema editor.

For cases where we aren't enforcing a schema but have a schema for the type loaded I guess it makes sense to try and coerce the type correctly, but do we stop people adding an invalid value altogether?

For those cases where we have no schemas and there isn't currently a value, I think trying to scan existing annotations to "derive" a schema is dangerous (and could be really slow). At the end of the day we've always said you can but any feature value pair on an annotation, and we've never said they have to match across annotations of the same type. Trying to enforce that now through the UI seems odd. Especially odd in that what do we do if we load existing documents that have inconsistent feature types?

greenwoodma commented 5 years ago

Okay, this is weird. It appears that Swing is already doing some of this for us.... unless I'm going mad.

I've created a simple example GATE XML document for testing (I can't attach XML documents so you'll need to change the extension once downloaded): coercion-test.txt

This test file contains a single annotation which has two features

  1. string: this has as it's value the number 5 stored as a java.lang.String
  2. number: this has as it's value the number 5 stored as a java.lang.Long

Try loading the document into GATE 8.6-SNAPSHOT and changing both features to have the value 6 and then save the document as GATE XML again. If you look at the XML in an editor it looks as if the number feature is still of type java.lang.Long.

I can't find that we are doing this ourselves anywhere, so I'm wondering if the DefaultComboBoxModel or JComboBox classes have been updated since we originally noticed the problem to handle some of this themselves; I've noticed that at some point JComboBox gained generics which I'm sure it didn't use to have.

Can someone other than me (hello @ianroberts ) check that I'm not going mad and they can reproduce this? If so we can probably close this issue, as it will work for the majority of the cases we care about where we are editing existing values.

greenwoodma commented 5 years ago

Note, that I ran the above test under Java 9.0.4 (to get the HDPI support on my laptop) so I guess behaviour could be different under other Java versions, although a quick test under Java 8 seems to show the same behaviour.

ianroberts commented 5 years ago

Tracing through the Swing libraries this behaviour appears to come from BasicComboBoxEditor.setValue, which has a special case when you try to set a new value and:

In this case it attempts to convert the new string value using that valueOf method, and falls back to the string if valueOf throws an exception. In particular this means that:

Pretty much anything else will be converted to string, including Date, UUID (which has fromString but not valueOf), the various java.time objects (which have parse but not valueOf), etc.

This behaviour is technically L&F-specific but none of the metal, synth, nimbus, aqua (mac) or windows L&F implementations override the key method from Basic.

greenwoodma commented 5 years ago

Given that covers most of the types we use regularly (and those that were part of the original discussion) I'm going to argue that the issue isn't really an issue any longer and close it.