Open Tahnan opened 3 weeks ago
If the two type instances come from the same type system, it should be sufficient to simply compare their names. Only they from from two different typesystems, a more thorough test might be required to check if the definitions are compatible.
In which case do you compare two types to each other?
If the two type instances come from the same type system, it should be sufficient to simply compare their names.
So it's interesting, because if you look at the example I gave to recreate it, both typesystems are the result of cassis.load_dkpro_core_typesystem()
. So they're different objects (one is two
is of course False), but they have identical contents. Nevertheless, the RecursionError triggers.
In this case, I was actually comparing two different typesystems: one was the system you get from the load command, and the other was the typesystem on a document downloaded from INCEpTION in UIMA CAS JSON format. The typesystems in this case are definitely not equivalent, but I was pretty sure one was a subset of the other.
There's a related...not bug, but feature request, maybe. It's not important enough to me to write it up as an actual feature request, because as I said I'm only very rarely trying to work with two different typesystems. But for posterity, in case anyone else needs to know, here's what happened:
I downloaded a document from INCEpTION and tried to load it with cassis.load_document_from_json(doc, typesystem=my_typesystem)
(where my_typesystem is cassis.load_dkpro_core_typesystem()
plus some INCEpTION layers like custom.Span
and so forth). This produced an error, even with merge_typesystems=True
, because the conflict was not some types being missing. The problem was that both typesystems--the one in the document, and my_typesystem
--had the type de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures
, but the feature number
was defined differently on them.
OK, that's a reasonable error to get! If they define Features differently, then yes, trying to merge them should fail.
Except...the difference wasn't at all functional. Both of them had rangeTypeName = uima.cas.String. The only difference between the two features was that the one defined in this repo in dkpro-core-types.xml
has a description on the feature ("Singular/plural"), and the one defined in the UIMA CAS document I was trying to load had no description. In fact, as far as I could tell, that was true all the way through it: the types had the same name, they had the same features on them, and all of those features were identical except that in one system they had descriptions and in the other they didn't.
As a programmer, I can totally understand "these differ, and I'm going to raise an Exception rather than try to guess why". As a user, it was so obvious how to reconcile them ("the description does not matter to me! just...leave it off everything!") that I really wanted the merge_typesystems option to figure it out. I can imagine a feature request that would do this (maybe a "merge_strategy" option that behaves as it currently does if specified as "strict", but specifying it as "ignore_descriptions"....?). But, as I said, I don't personally need it enough to request it, and I'm sure you've got higher priorities.
The recursion error is definitely a bug. I'm just asking to get a better idea of what an appropriate fix would be. Considering what you said, I think it should be ok to code eq
method such that it returns True
if both types have the same name and come from the same typesystem and otherwise return False
.
I would say that merging fails if the difference is in the description of a feature, would also seem like a bug. Of course the description could be different and one could argue that if the descriptions differ, the semantics of the feature may actually be different. At least the case where one descriptions is empty and the other not should not trigger a failure. But probably even if both descriptions are set to different values, it should not fail.
Would you like to contribute a test/fix for the eq
issue?
Description
If I have two different Types with at least some similarities (e.g. the same name), checking whether
type_1 == type_2
leads to a RecursionError. To the best of my understanding:uima.cas.TOP
.Type(name=uima.cas.TOP)
objects don't have supertypes, so there's no further recursion there. However,__eq__
also has to check that their children are equal, and their children are dictionaries from names to types.Of course, I'm working by a little bit of guesswork, so don't take that as definitive.
To Reproduce
Simply run the following commands:
and behold: RecursionError!
one.get_type('uima.cas.Boolean') == two.get_type('uima.cas.Float')
returns False immediately, presumably because the names are different. Oddly,one.get_type('uima.cas.Boolean') == one.get_type('uima.cas.Boolean')
returns True immediately, without a recursion depth error (does Python skip__eq__
ina == b
ifa is b
? I don't think it does, so I'd expect this to have the same error...?).Expected behavior Obviously, not a recursion error. Since the
__eq__
method is generated by@attr.s
, as far as I can tell, the only way around it might be to write a custom method, and I'm not sure what it should check. Is it enough to ask if they have the same.name
and.typesystem
? In this case I was trying to tell if they had the same features (and in fact they do not, as it happens), and it looks like Feature has its own non-@attr-defined equality method, so maybe that should be part of it too? Honestly I'm not sure; I don't spend much time working with two different typesystems at once.Technical info
@attr.s(slots=True, hash=False, eq=True, repr=False)
on the Type class, I'm assuming the same__eq__
gets generated for it as in 0.9.1.