Open databill86 opened 2 years ago
The output for Spanish v3.2.0-v3.4.0 pipelines should be very similar, since those pipelines are using the same rule-based lemmatizer and relatively similar settings otherwise. Open a separate issue if there are additional problems for Spanish?
Thanks for the feedback about Italian, this is related to #10953, which also includes some additional Italian examples: https://github.com/explosion/spaCy/issues/10953#issuecomment-1201328111
For v3.3.0 we mainly switched lookup lemmatizers to the new trainable lemmatizer, but the trainable lemmatizer will make very different kinds of mistakes than the lookup lemmatizer, which boil down to the expectations for statistical components as described in #3052.
If you want to switch back to the v3.3 lookup lemmatizer: https://spacy.io/usage/v3-3#pipeline-updates, https://spacy.io/models#design-modify
Thank you for your response! I will switch back to the v3.3 lookup lemmatizer and I will open a new issue for the Spanish examples, I may also have some other examples for French.
Hello,
I've recently upgraded the spaCy pretrained models from v3.2 to 3.4, but I found that the tagger and lemmatizer performance dropped significantly for italian and spanish.
I've prepared a table to show some examples for italian, along with the expected output (lemma, POS)
Some lemma are in uppercase, is there any reason that explains it ?
Thank you!
Your Environment