Open PolKul opened 3 years ago
parsing the same sentence with amrlib parser, for example, gives me this result with amr-unknown:
# ::snt Which architect of Marine Corps Air Station Kaneohe Bay was also tenant of New Sanno hotel?
(t / tenant-01
:ARG0 (a / amr-unknown
:ARG0-of (a2 / architect-01
:ARG1 (f / facility
:name (n / name
:op1 "Marine"
:op2 "Corps"
:op3 "Air"
:op4 "Station"
:op5 "Kaneohe"
:op6 "Bay"))))
:ARG1 (h / hotel
:name (n2 / name
:op1 "New"
:op2 "Sanno"))
:mod (a3 / also))
It should produce amr-unknown, we use this often for question parsing.
What did you trained it with? I just checked on a v0.4.2 deploy and it parses correctly. Also, do you tokenize?
hi @ramon-astudillo, well, I was trying to follow your setup instructions from here for setup and training (the default action-pointer network config bash run/run_experiment.sh configs/amr2.0-action-pointer.sh
). This is the code for inference:
from transition_amr_parser.parse import AMRParser
amr_parser_checkpoint = "/DATA/AMR2.0/models/exp_cofill_o8.3_act-states_RoBERTa-large-top24/_act-pos-grh_vmask1_shiftpos1_ptr-lay6-h1_grh-lay123-h2-allprev_1in1out_cam-layall-h2-abuf/ep120-seed42/checkpoint_best.pt"
parser = AMRParser.from_checkpoint(amr_parser_checkpoint)
words = [word.strip(string.punctuation) for word in text.split()]
annotations = parser.parse_sentences([words])
would mind sharing your trained checkpoint to see if it makes any difference?
would mind sharing your trained checkpoint to see if it makes any difference?
I am certain it should. We are looking into sharing pre-trained models but I can not say anything at this point.
Also FYI we will update to v0.5.1
soon (post EMNLP preprint submission deadline). This new model (Structured-BART) is new SoTA for AMR2.0 and will be published at EMNLP2021, a non updated prerprint is here https://openreview.net/forum?id=qjDQCHLXCNj
From experience in parsing questions, I can say silver-data fine-tuning works well. You can parse some text corpus with questions, filter it with a couple of rules*, and the use it as additional training data. The training scheme silver+gold pre-training with gold fine-tuning seems to work best, see e.g. https://aclanthology.org/2020.findings-emnlp.288/
(*) For example ignore all parses having :rel
(which indicates a detached subgraph) or with missing amr-unknown
(if you are certain it should have one).
I was able to train the parser as per your instructions. But when testing the trained model I found that it didn't produce amr-unknown node. For example: