Open Hammad-NobleAI opened 2 years ago
Can you try adding a from scispacy.base_project_code import *
to the top of your file?
Thanks for getting back to me. I tried that, and it seems to have got beyond that issue now, but has led into this:
File ~/.pyenv/versions/3.10.5/envs/el-demo/lib/python3.10/site-packages/spacy/language.py:1249, in Language.begin_training(self, get_examples, sgd)
1242 def begin_training(
1243 self,
1244 get_examples: Optional[Callable[[], Iterable[Example]]] = None,
1245 *,
1246 sgd: Optional[Optimizer] = None,
1247 ) -> Optimizer:
1248 warnings.warn(Warnings.W089, DeprecationWarning)
-> 1249 return self.initialize(get_examples, sgd=sgd)
File ~/.pyenv/versions/3.10.5/envs/el-demo/lib/python3.10/site-packages/spacy/language.py:1286, in Language.initialize(self, get_examples, sgd)
1284 before_init(self)
1285 try:
-> 1286 init_vocab(
1287 self, data=I["vocab_data"], lookups=I["lookups"], vectors=I["vectors"]
1288 )
...
23 if require_exists and not location.exists():
---> 24 raise ValueError(f"Can't read file: {location}")
25 return location
ValueError: Can't read file: project_data/vocab_lg.jsonl
Ok, I think you are working from an outdated example, because the begin_training
function is deprecated (https://spacy.io/api/language#initialize). If you want to write your own training loop, you will probably need to look deeper into how spacy does it in the train CLI. That being said, you should probably use their config system and CLI for training as much as possible. Check out project.yml
and the configs here https://github.com/explosion/projects/tree/v3/tutorials/nel_emerson. All that being said, I think this is also a question about spacy, not scispacy, as I think you will get similar errors if you run your script using en_core_web_md
, so further questions are probably better directed to the spacy folks. Feel free to reopen if it ends up being scispacy specific.
Edit: looks like the base spacy models don't have this issue, so it is something more specific. I think it might still be a question for the spacy folks, but first you should try using the config system and CLI.
If it turns out you do just need that vocab file to continue, you can probably recreate it from the en_core_sci_lg
model somehow, but you can definitely also just create it the same way that we do. See the convert-lg
command in our project.yml
.
see #450 for a workaround
I'm attempting to use your "en_core_sci_lg" pipeline to extract chemical entities from documents, and then using those entities as a basis to train Spacy's Entity Linker (as shown in this document). Here are the relevant portions of my code:
When I get to the error line (commented towards the end of the code block), I get the following error:
I'm running on Mac OS 12.4, M1 Pro, 16 GB unified memory. Scispacy==0.5.0, spacy==3.2.4. Are Scispacy models compatible with this workflow, or is that something that hasn't/won't be implemented? Thanks in advance!