New add unified edsnlp.data api (json, brat, spark, pandas) and LazyCollection object
to efficiently read / write data from / to different formats & sources.
New unified processing API to select the execution execution backends via data.set_processing(...)
The training scripts can now use data from multiple concatenated adapters
Support quantized transformers (compatible with multiprocessing as well !)
Changed
edsnlp.pipelines has been renamed to edsnlp.pipes, but the old name is still available for backward compatibility
Pipes (in edsnlp/pipes) are now lazily loaded, which should improve the loading time of the library.
to_disk methods can now return a config to override the initial config of the pipeline (e.g., to load a transformer directly from the path storing its fine-tuned weights)
The eds.tokenizer tokenizer has been added to entry points, making it accessible from the outside
Deprecate old connectors (e.g. BratDataConnector) in favor of the new edsnlp.data API
Deprecate old pipe wrapper in favor of the new processing API
Fixed
Support for pydantic v2
Support for python 3.11 (not ci-tested yet)
Changelog (v0.10.0beta1)
Large refacto of EDS-NLP to allow training models and performing inference using PyTorch
as the deep-learning backend. Rather than a mere wrapper of Pytorch using spaCy, this is
a new framework to build hybrid multi-task models.
To achieve this, instead of patching spaCy's pipeline, a new pipeline was implemented in
a similar fashion to aphp/edspdf#12. The new pipeline tries to preserve the existing API,
especially for non-machine learning uses such as rule-based components. This means that
users can continue to use the library in the same way as before, while also having the option to train models using PyTorch. We still
use spaCy data structures such as Doc and Span to represent the texts and their annotations.
Otherwise, changes should be transparent for users that still want to use spacy pipelines
with nlp = spacy.blank('eds'). To benefit from the new features, users should use
nlp = edsnlp.blank('eds') instead.
Added
New pipeline system available via edsnlp.blank('eds') (instead of spacy.blank('eds'))
Use the confit package to instantiate components
Training script with Pytorch only (tests/training/) and tutorial
New trainable embeddings: eds.transformer, eds.text_cnn, eds.span_pooler
embedding contextualizer pipes
Re-implemented the trainable NER component and trainable Span qualifier with the new
system under eds.ner_crf and eds.span_classifier
New efficient implementation for eds.transformer (to be used in place of
spacy-transformer)
Changed
Pipe registering: Language.factory -> edsnlp.registry.factory.register via confit
Lazy loading components from their entry point (had to patch spacy.Language.init)
to avoid having to wrap every import torch statement for pure rule-based use cases.
Hence, torch is not a required dependency
Changelog (v0.10.0)
Added
edsnlp.data
api (json, brat, spark, pandas) and LazyCollection object to efficiently read / write data from / to different formats & sources.data.set_processing(...)
Changed
edsnlp.pipelines
has been renamed toedsnlp.pipes
, but the old name is still available for backward compatibilityedsnlp/pipes
) are now lazily loaded, which should improve the loading time of the library.to_disk
methods can now return a config to override the initial config of the pipeline (e.g., to load a transformer directly from the path storing its fine-tuned weights)eds.tokenizer
tokenizer has been added to entry points, making it accessible from the outsideedsnlp.data
APIpipe
wrapper in favor of the new processing APIFixed
Changelog (v0.10.0beta1)
Large refacto of EDS-NLP to allow training models and performing inference using PyTorch as the deep-learning backend. Rather than a mere wrapper of Pytorch using spaCy, this is a new framework to build hybrid multi-task models.
To achieve this, instead of patching spaCy's pipeline, a new pipeline was implemented in a similar fashion to aphp/edspdf#12. The new pipeline tries to preserve the existing API, especially for non-machine learning uses such as rule-based components. This means that users can continue to use the library in the same way as before, while also having the option to train models using PyTorch. We still use spaCy data structures such as Doc and Span to represent the texts and their annotations.
Otherwise, changes should be transparent for users that still want to use spacy pipelines with
nlp = spacy.blank('eds')
. To benefit from the new features, users should usenlp = edsnlp.blank('eds')
instead.Added
edsnlp.blank('eds')
(instead ofspacy.blank('eds')
)tests/training/
) and tutorialeds.transformer
,eds.text_cnn
,eds.span_pooler
embedding contextualizer pipeseds.ner_crf
andeds.span_classifier
Changed
Language.factory
->edsnlp.registry.factory.register
via confit