Closed louni-g closed 9 months ago
Hi, thank you for this detailed feedback !
Indeed, the eds.negation
(and any other pipe relying on the EDSPhraseMatcher
pipe) applies the same processing to the entries of its term lists as it does to documents. For that, it filters the pipes to keep those that affect the token extensions, and the lemmatizer
and morphologizer
components declare such changes to tokens:
nlp.get_pipe_meta('morphologizer').assigns
# ['token.morph', 'token.pos']
nlp.get_pipe_meta('morphologizer').assigns
# ['token.lemma']
Ideally,
__init__()
method (e.g. instead of storing terms, storing the .norm_
, .text
extensions, ...)In the meantime,
EDSPhraseMatcher
(and its variants) to skip pipes that are clearly not required (as shown by the nlp.get_pipe_meta('morphologizer').assigns
attribute) or pipes that are disabled@louni-g may I ask for what task you need a transformer in your pipeline? is it to use the pre-trained lemmatizer / morphologizer / ... pipes of spacy, or to train a new model, or something else ?
@louni-g may I ask for what task you need a transformer in your pipeline? is it to use the pre-trained lemmatizer / morphologizer / ... pipes of spacy, or to train a new model, or something else ?
I trained a spacy-transformers NER model and in my case I only have the following pipes: ["transformer", "ner"] and it's the "ner" one that ends up in the token_pipelines:
nlp.get_pipe_meta('ner').assigns
# ['doc.ents', 'token.ent_iob', 'token.ent_type']
so I think it would be a totally ok to skip non necessary pipes 👍
When loading a pipeline from disk, if the pipeline contains a spacy-transformers model and any edsnlp qualifiers this error is encountered:
Description
Full Traceback
``` File "/Users/Louise/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch.py", line 8, inThe error occurs during the initialization of the qualifiers, where the
token_pipelines
are ran inEDSPhraseMatcher
'sbuild_patterns
. I did a bit of digging and it seems like the error comes from the fact that the spacy-transformers pipelines are not fully initialized at this point so running them raises an error. Possible fixes could be to skip the problematic pipes if they are not necessary to run, or do this step once the whole pipeline has been completely initialized (not in the__init__
)How to reproduce the bug
Your Environment