Updated dependency parsing and part-of-speech tagging to use the latest version of the DDT treebank 🌳
Added a trainable lemmatizer, notably improving the lemmatization
All model are trained using the intersection between the CDT and the DDT treebanks (so actually trained on less data than before) 🤯
This includes the annotations from DaNED, DaCoref and DaNE
Large model:
obtained state-of-the-art performance on:
Dependency parsing
Part-of-speech tagging
Morphological tagging
lemmatization (from 84.91 to 95.89!)
Improved performance on:
Reduced performance for NER down to 87.38 but we recommend either using the :code:nlp.add_pipe("dacy/ner") to add the SotA ScandiNER model to your pipeline or use one of the new fine-grained NER models.
Added support
Coreference Resolution, performance isn't great yet, but it's a start!
Named entity linking, with a precision of 0.86 but recall is still low due to a lacking knowledge base
Medium model:
Consistent improvements across all tasks:
Notable performance gain for NER from an F1 of 81.79 to 85.82
Notable performance gain for lemmatization from an ACC 84.91 to 94
2.7.0 (15/05/23)
Updated the DaCy models to version 0.2.0, including a small, medium and large
Large model:
nlp.add_pipe("dacy/ner")
to add the SotA ScandiNER model to your pipeline or use one of the new fine-grained NER models.Medium model:
Small model:
Fixes a variety of issues:
Removed support for DaCy model version 0.1.0, if you need to use these models you will have to use
DaCy <= 2.0.0
What is next?