allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

Scispacy library not working with medspacy. #513

Closed DeFrayne closed 3 weeks ago

DeFrayne commented 2 months ago

Hello, I am trying to use the en_ner_bionlp13cg_md model with medspacy. This only seems to work if I disable the parser, which is a major appeal of medspacy, as seen below: nlp = medspacy.load("en_ner_bionlp13cg_md", disable=['parser'])

This is successful, but I lose parsing.

If I run the following: nlp = medspacy.load("en_ner_bionlp13cg_md") text = "blahblahblah" doc = nlp(text) visualize_ent(doc)

I get the following error: ValueError Traceback (most recent call last) Input In [86], in <cell line: 2>() 1 text = "blahblahblah" ----> 2 doc = nlp(text) 3 visualize_ent(doc)

File c:\Users\ddefr\anaconda3\lib\site-packages\spacy\language.py:1054, in Language.call(self, text, disable, component_cfg) 1052 raise ValueError(Errors.E109.format(name=name)) from e 1053 except Exception as e: -> 1054 error_handler(name, proc, [doc], e) 1055 if not isinstance(doc, Doc): 1056 raise ValueError(Errors.E005.format(name=name, returned_type=type(doc)))

File c:\Users\ddefr\anaconda3\lib\site-packages\spacy\util.py:1722, in raise_error(proc_name, proc, docs, e) 1721 def raise_error(proc_name, proc, docs, e): -> 1722 raise e

File c:\Users\ddefr\anaconda3\lib\site-packages\spacy\language.py:1049, in Language.call(self, text, disable, component_cfg) 1047 error_handler = proc.get_error_handler() 1048 try: -> 1049 doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg] 1050 except KeyError as e: 1051 # This typically happens if a component is not initialized 1052 raise ValueError(Errors.E109.format(name=name)) from e

File c:\Users\ddefr\anaconda3\lib\site-packages\PyRuSH\PyRuSHSentencizer.py:53, in PyRuSHSentencizer.call(self, doc) 51 def call(self, doc): 52 tags = self.predict([doc]) ---> 53 cset_annotations([doc], tags) 54 return doc

File c:\Users\ddefr\anaconda3\lib\site-packages\PyRuSH\StaticSentencizerFun.pyx:48, in PyRuSH.StaticSentencizerFun.cset_annotations()

File c:\Users\ddefr\anaconda3\lib\site-packages\PyRuSH\StaticSentencizerFun.pyx:56, in PyRuSH.StaticSentencizerFun.cset_annotations()

File c:\Users\ddefr\anaconda3\lib\site-packages\spacy\tokens\token.pyx:509, in spacy.tokens.token.Token.sent_start.set()

File c:\Users\ddefr\anaconda3\lib\site-packages\spacy\tokens\token.pyx:528, in spacy.tokens.token.Token.is_sent_start.set()

ValueError: [E043] Refusing to write to token.sent_start if its document is parsed, because this may cause inconsistent state.

Any assistance in resolving this is greatly appreciated. I do not have this error if I use spacy.load(), only medspacy.load().

dakinggg commented 1 month ago

I'm a bit confused why you are trying to used medspacy with a scispacy model. Is that expected to work?

DeFrayne commented 1 month ago

Yes - I just ended up disabling the parser and going with it. It works fine.