Closed RikVN closed 4 years ago
Hi Rik,
sorry for the late response. I forgot to adjust my notifications. I fixed the first bug you mentioned and the second one should also not give you headaches anymore. In fact, this was a half-heartedly implemented feature and you would have needed to supply the --extend-vocab
option.
Are you aware that there is scripts/predict_from_raw_text.sh
that basically executes this command and applies the necessary post-processing so you get the AMR graph in one go? This should avoid evaluating the AM dependency tree to the graph and then calling the post-processing scripts.
Thanks, it works now! (Yes I knew about the raw text parse, I just isolated the problem for myself a bit more and forgot that I started with that script).
Hi all, first of all great repo with good documentation. However, I couldn't get the raw text parser for AMR to work.
First bug I encountered was spacy_interface.py in the lemma_postprocess function. Where it says "return lemma_dict[lemma]" should be replaced by (I think) "return lemma_dict[token.lower()]"
Then I had a different error: the models errors because it encounters tokens that are not in vocab (e.g. CARDINAL, EVENT, PRODUCT, LAW) that are also not handled by ne_dict/ne_postprocess in spacy_interface.
Maybe this has to do with different version of the models? I'm using the one from https://coli-saar-data.s3.eu-central-1.amazonaws.com/raw_text_model.tar.gz.
Thanks for looking into this.
EDIT: trace for parsing the AMR dev set (tokenized sentences only)
python3 parse_raw_text.py downloaded_models/raw_text_model.tar.gz AMR-2017 AMR/dev.tok example//AMR-2017.amconll --cuda-device 0
Either spacy pytorch transformers or cupy not available, so you cannot use spacy-tok2vec! This is only an issue, if you intend to use roberta or xlnet. 0it [00:00, ?it/s]Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary. See documentation for
predictor.parse_and_save(args.formalism, temp_path, args.output_file)
File "/project/rvannoord/am-parser/graph_dependency_parser/components/evaluation/predictors.py", line 132, in parse_and_save
predictions = self.dataset_reader.restore_order(forward_on_instances(self.model, instances,self.data_iterator))
File "/project/rvannoord/am-parser/graph_dependency_parser/components/evaluation/iterator.py", line 45, in forward_on_instances
dataset.index_instances(model.vocab)
File "/project/rvannoord/anaconda3/envs/saarland/lib/python3.7/site-packages/allennlp/data/dataset.py", line 155, in index_instances
instance.index_fields(vocab)
File "/project/rvannoord/anaconda3/envs/saarland/lib/python3.7/site-packages/allennlp/data/instance.py", line 72, in index_fields
field.index(vocab)
File "/project/rvannoord/anaconda3/envs/saarland/lib/python3.7/site-packages/allennlp/data/fields/sequence_label_field.py", line 98, in index
for label in self.labels]
File "/project/rvannoord/anaconda3/envs/saarland/lib/python3.7/site-packages/allennlp/data/fields/sequence_label_field.py", line 98, in
for label in self.labels]
File "/project/rvannoord/anaconda3/envs/saarland/lib/python3.7/site-packages/allennlp/data/vocabulary.py", line 630, in get_token_index
return self._token_to_index[namespace][self._oov_token]
KeyError: '@@UNKNOWN@@'
non_padded_namespaces
parameter in Vocabulary. Your label namespace was 'lemmas'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary. See documentation fornon_padded_namespaces
parameter in Vocabulary. 1368it [00:00, 4047.18it/s] Namespace: ner_labels Token: CARDINAL Traceback (most recent call last): File "parse_raw_text.py", line 146, in