allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

Allennlp SRL srl-eval.pl #4102

Closed francesca418 closed 4 years ago

francesca418 commented 4 years ago

TLDR; Issue with srl-eval.pl file in Allennlp v0.9 when training with ontonotes

Below is a summary of our issue:

  1. We are working on replicating the SRL results from https://www.aclweb.org/anthology/P17-1044/

  2. When we try to train on the Ontonotes 5 SRL data (using Allennlp 0.9), srl-eval.pl file always quits during validation in the first epoch on error with line 1378: (!@SP) or die "phrase_set->load_SE_tagging: some phrases are unclosed!\n";

  3. We have narrowed down the issue to the loop from lines 1348-1357, where it is putting certain tags in the SP array twice, but only taking one out. The tags that cause these issues are ones that are nested such as (ARG2(ARG1*)) where a word is given two predicate labels for one predicate argument. That single tag is pushed onto the SP array twice, once as 'ARG2' and once as 'ARG1'. Only one is popped off, causing the issue saying that we have unclosed phrases

  4. We are hoping that there might be some workaround for this or someone who has already dealt with this issue before.

  5. We have a current workaround in which we only allow for a token to receive one argument label; however, it would be more ideal to be able to get the original model to work and to allow for a word to have multiple argument labels.

Thank you!

dirkgr commented 4 years ago

The srl-eval.pl file we have in there is the exact official evaluation script. That's why it's there, instead of using a Python version. So we can't change it, at least not in the official repo.

I didn't look deeply into this, but their original code is available at https://github.com/luheng/deep_srl/. It looks like they use the official src-eval.pl script, called from SRLEvaluator, which produces output in just the right format.

Do you know about https://github.com/allenai/allennlp/blob/v0.9.0/scripts/write_srl_predictions_to_conll_format.py? Do you use it?

celine-lee commented 4 years ago

Hi Dirk! Thanks for the advice. I am working with Francesca on this.

I am currently trying to get the Luheng's deep_srl repo working. In the meantime, I did get a chance to look into and run the v0.9.0 write_srl_predictions_to_conll_format.py file. It looks like it ultimately calls the write_conll_formatted_tags_to_file function, which is also called by the SrlEvalScorer that is the metric used by the evaluator in the model we were running before. The line immediately following SrlEvalScorer's calling of the write_conll_formatted_tags_to_file function is the one that calls the srl-eval.pl which caused the issue Francesca asked about.

It definitely makes sense that the evaluator should produce output in the proper format-- we are trying to figure out what might be different between our setup and yours, that is causing issues with some of these nested labels. We are using the CONLL2012 - Ontonotes 5.0 dataset, consistent with what the ontonotes.py file requests. Are there any other resources that we might be missing in order to have this work? Perhaps some specification of dataset domain, or other?

Thanks!

Riccorl commented 4 years ago

I have the same problem too. I'm trying to train with bert_base_srl.jsonnet and the conll2012, but it crashes during validation in the first epoch.

Riccorl commented 4 years ago

I don't know if it's useful, but it seems the gold file is causing the problem. I checked the gold.txt and the predicted.txt generated by the model (they are in a folder inside /tmp) separately and the evaluation scripts breaks parsing gold.txt.

Riccorl commented 4 years ago

@celine-lee @francesca418 I'm sorry to bother you, but did you find a solution for your problem?

celine-lee commented 4 years ago

Hi @Riccorl , unfortunately no we haven't found a solution yet. I noticed that too, that the evaluation script breaks while parsing the gold.txt and predicted.txt. For me, it breaks mainly when reading certain nested labels. Earlier, we tried making slight changes to the _process_span_annotations_for_word function in the ontonotes.py file (because that is part of the pipeline that writes to those /tmp files), and the regex line in load_SE_tagging of srl-eval.pl, in order to handle such nested labels as (ARG2(ARG1*)), but haven't found one that works well yet.

Riccorl commented 4 years ago

Hi @Riccorl , unfortunately no we haven't found a solution yet. I noticed that too, that the evaluation script breaks while parsing the gold.txt and predicted.txt. For me, it breaks mainly when reading certain nested labels. Earlier, we tried making slight changes to the _process_span_annotations_for_word function in the ontonotes.py file (because that is part of the pipeline that writes to those /tmp files), and the regex line in load_SE_tagging of srl-eval.pl, in order to handle such nested labels as (ARG2(ARG1*)), but haven't found one that works well yet.

Today I extracted the conll2012 again, using the scripts here, following this guide. Now the code works. Let me know if it works for you.

P.S. I'm using allen-nlp and allennlp-models 1.0.0rc2

celine-lee commented 4 years ago

That is great that it works, @Riccorl ! I just tried withe pip install of the 1.0.0rc2 model and this jsonnet config: https://github.com/allenai/allennlp/blob/3ca8035a3a0e8f9f6049e9632267f394dca5abad/training_config/bert_base_srl.jsonnet

I am getting a ConfigurationError though:

configerror

Did you run into this issue as well?

epwalsh commented 4 years ago

@celine-lee the PyPI release of allennlp-models==rc2 is actually completely broken. Please try rc3 or installing from source.

Riccorl commented 4 years ago

That is great that it works, @Riccorl ! I just tried withe pip install of the 1.0.0rc2 model and this jsonnet config: https://github.com/allenai/allennlp/blob/3ca8035a3a0e8f9f6049e9632267f394dca5abad/training_config/bert_base_srl.jsonnet

I am getting a ConfigurationError though:

configerror

Did you run into this issue as well?

The 1.0.0rc2 release of allennlp-models is broken if you install from pip. As @epwalsh said, you can try with the new version or by installing it from source.

Beside that, your particular error is due to the fact that the srl model is now in the allennlp-models package. To solve it, download the configuration file from here, change type in dataset_reader from srl to allennlp_models.syntax.srl.srl_reader.SrlReader and run the command with the option --include-package allennlp_models

celine-lee commented 4 years ago

Thank you!

dirkgr commented 4 years ago

@Riccorl, I didn't dig into the scripts you found there. Is this something we should add to the allennlp library, to make sure nobody else runs into this problem?

Riccorl commented 4 years ago

@Riccorl, I didn't dig into the scripts you found there. Is this something we should add to the allennlp library, to make sure nobody else runs into this problem?

I don't know yet atm. The problem seems related to nested spans. I had nested spans with a conll2012 that some one else parsed for me. I tried to parse it again and it doesn't contain these nested spans.

I think I can update you in a few days, I'd like to know why there is this inconsistency between the two version of the conll I'm working with.

celine-lee commented 4 years ago

Hey @Riccorl , have you come upon an answer to why there are inconsistencies between the two versions?

Not sure if this is helpful, but I found that for the files that had nested label problems before, the "same" file in the newly-assembled dataset simply omitted the arguments that caused them. For example, in ../pt/nt/46, the predicate argument for 'sold', which is what had the "bad" nested argument, is not in the newly-assembled, working dataset. The same is true for a couple other places where I found nested arguments.

meat_market_comparison

I am not sure if it was manually removed because it was causing problems, or why the older version we were working with has something that the current conll-formatted-ontonotes does not. But it seems to work quite fine; the trained model seems to match the metrics in their paper.

github-actions[bot] commented 4 years ago

This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇