Closed BramVanroy closed 3 years ago
The test that is failing is not due to this PR. It fails from the English language dependency labels. Probably due to new stanza models in newer versions. The right labels on stanza 1.2.3 are
['discourse', 'root', 'punct', 'nsubj', 'cop', 'det', 'root', 'punct']
AFAICT stanza models are linked to the stanza version and models may change over patches, e.g. a 1.2.3 model may be different from 1.2.2. I can't test ALL different stanza models, so I am not sure how to deal with this. Pin a very specific stanza version in setup? I'm not a big fan of that.
Maybe we can just use the test data from their repo? Or from a sentence from the UD treebanks.
Try merging upstream/master
into this?
Sorry, one more thing (and I can do it if you want): can you format the tests with black
?
The tests are more of a sanity check than real unit tests (which is also part of why they're not packaged with the module like in our other packages). Updates here are rare enough that I'll continue to go with the lazy option of just updating the tests as necessary.
That makes sense! Most functionality comes down to copying stanza attributes into spaCy objects. Useful, of course, but not a lot that can go wrong with that.
Thanks for the PR!
This PR makes sure that the fallback for
tag_
in spacy-stanza is the same as in spaCy. Currently this is dealt with differently inspacy-stanza
compared to spaCy. In spaCy, the attributeruler copiespos
totag
in casetag
is not defined. However, in spacy_stanza, the morphological features are used instead. That is not desirable, I think. As a user you would expect the same behavior in spaCy and spacy_stanza.More info and examples in the linked issue https://github.com/explosion/spacy-stanza/issues/76
For reference,
spacy-udpipe
does not have a fallback, although if you agree with the current PR I may ask TakeLab whether they want to streamline their behavior as well.Warning: this PR does result in a breaking change, potentially resulting in different Token.tag_ values than before
Added a test for Spanish to illustrate the issue, which currently does not have xpos.
closes #76