explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
723 stars 59 forks source link

Unknown morphological feature: 'ConjType' #35

Closed TahaMunir1 closed 11 months ago

TahaMunir1 commented 4 years ago

When I run nlp(comment) for Urdu language, I am getting error: [E167] Unknown morphological feature: 'ConjType' (9141427322507498425). This can happen if the tagger was trained with a different set of morphological features. If you're using a pretrained model, make sure that your models are up to date: python -m spacy validate Some of the docs work while some don't.

To Reproduce Following code to get tokens and pos tags:

snlp = stanza.Pipeline(lang='ur') 
nlp = StanzaLanguage(snlp) 
doc = nlp('یہ سرد اور تلخ تھا')

Windows and CentOs Python3.8 Stanza version: 1.0.0

adrianeboyd commented 4 years ago

Sorry, some of the tag maps haven't been tested well for unsupported morphological features, in particular for languages where spacy doesn't have provided models, since we don't train a tagger internally and catch this error in the tag map.

Try using v2.3.0, which has an updated tag map for Urdu. If you want to use v2.2 or an older version, you can also just edit the tag map in your installation (under spacy/lang/ur/tag_map.py) to remove any of the unsupported morphological features like "ConjType": "coor", which I think is the unsupported feature here.

adrianeboyd commented 11 months ago

Just going through some older issues...

I think this was resolved in spacy v2.3, or at the very latest in spacy v3.

But please feel free to reopen if you're still running into issues!