Closed tomhoper closed 2 years ago
I took a quick look to confirm that this wasn't caused by my recent commits over the last week or so, which were made to fix a separate issue. As far as I can tell, this problem is unrelated to the changes I made. I've found two issues:
top_spans
error that @tomhoper identified. I think this is likely because some necessary model input isn't being passed in correctly, but I haven't debugged further.dataset
is being set to scierc
when inputs are passed into the model here. This will break for any model that was trained on a dataset other than scierc
; see here for more information.The Jupyter demo works at present because it uses the scierc-lightweight
model, which (1) doesn't perform coreference resolution, and (2) was trained on scierc
.
I unfortunately don't have bandwidth to add these features. @e3oroush, if you are able to fix this, feel free to submit a PR. I'm happy to provide help if you get stuck. If not, could you please update the docs and Jupyter notebook indicating which models the spacy interface will work for?
Thanks all!
I tried to look into your source codes to find out what the problem was, but unfortunately I couldn't figure it out.
Can you help me with that? I just tried to imitate the codes written in Predictors file in here, but apparently not.
Which part of the project is using this file? I'm pretty sure allennlp evaluate
command doesn't.
I'd like to fix the bug, because it seems to be a really tiny one.
The prediction code is used by the allennlp predict
command. There's an example in the README.
One way I can think of debugging would be to print out what a single instance looks like when you make predictions using allennlp predict
, compared to what it looks like using the Spacy pipeline. There must be a discrepancy in the way the input is being passed to the model. You may need to dig into the AllenNLP predictor.py file in order to figure out what's happening, since the dygie Predictor
subclasses AllenNLP's Predictor
. Make sure you're looking at version 1.1 of AllenNLP.
There's still the issue that the model will break if you specify the input dataset as scierc, but the model was trained on a different dataset. I think the easiest fix here is:
dataset
field when you pass in the data to match the model's label namespaces.More info on why this label namespace stuff is necessary can be found in model.md. Sorry for the (perhaps needless) complexity. If I had time to re-write this repo from scratch I'd do things differently.
Thanks a ton. I wasn't comfortable using allennlp API, but thanks to this project I'm catching up.
I will try to fix this bug over the weekend and make sure everyone can can use all the available models.
Great, thanks! Let me know if you've got more questions.
Sorry that took me so long, I didn't forget, just was occupied with lots of other stuffs (from getting Covid to passing exams lol).
But the source was a silly mistake from my side. Now it should be resolved with this pull request.
No worries, thanks for the fix! I'll take a look this weekend.
I just merged the PR. This fixes problem (1) mentioned above. Thanks for the fix!
The issue raised by @tomhoper is still present, so I will leave this open.
Sorry that I didn't get the second problem. I just added another PR to fix that issue.
Nice, thanks!
@tomhoper you can now predict using Mechanic in spacy-interface-example.ipynb like so:
component = DygieppPipe(nlp,pretrained_filepath="../pretrained/mechanic-coarse.tar.gz", dataset_name="None")
nlp.add_pipe(component)
doc = nlp(text)
Let me know if this works and I'll close the issue.
Thank you @e3oroush and @dwadden.
mechanic-coarse
is now working, but mechanic-granular
(an "event" model) is not. I used dataset_name="covid-event"
(using "None" didn't help) . Here's the error i'm getting:
line 35, in prepare_spacy_doc for sent in prediction["predicted_ner"]: KeyError: 'predicted_ner' `
Any ideas?
@dwadden i made a pull request that adds spacy support for "events" (namely mechanic-granular
but should work in general). #99
also added a notebook with examples using and comparing mechanic-coarse
and mechanic-granular
on some scientific documents.
Just responded to the PR.
I've tried loading several pre-trained models with
DygieppPipe
(imported fromdygie.spacy_interface.spacy_interface
) --scierc
,mechanic
(coarse/granular),genia
-- get errors raised inmake_output_human_readable.py
. Each model raises different errors so I'm not going to clutter it here, but they all seem to be aroundoutput_dict
not having some keys. Happy to share more details if needed.@e3oroush @dwadden