IBM / transition-amr-parser

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.
Apache License 2.0
245 stars 48 forks source link

Reproduction Problem: acquiring 0.4 in Smatch with amr2.0-structured-bart-large-neur-al-seed42 #38

Closed xsthunder closed 1 year ago

xsthunder commented 1 year ago

Thanks for releasing the code and the pre-trained weights. It's very helpful. However I've having trouble in reproducing the result with the pre-trained weights released in trained-checkpoints.. I'll detail the steps and some text in the following. Thank you in advance!

my steps

  1. Decode with Pre-trained model

in_checkpoint="/app/transition-amr-parser/DATA/amr_align_2022/DATA/AMR2.0/models/amr2.0-structured-bart-large-neur-al/seed42/checkpoint_wiki.smatch_top5-avg.pt"
input_file="/app/transition-amr-parser/DATA/AMR2.0/DATA/AMR2.0/corpora/test_sentence.txt"
amr-parse -c $in_checkpoint -i $input_file -o /app/transition-amr-parser/DATA/OUTPUT/test_output.amr
  1. compute smatch
smatch.py -f /app/transition-amr-parser/DATA/OUTPUT/test_output.amr /app/transition-amr-parser/DATA/AMR2.0/corpora/test.txt
# F-score: 0.28
  1. aware of --no-isi

  2. Decode with Pre-trained model with --no-isi flag and compute smatch

amr-parse -c $in_checkpoint -i $input_file -o /app/transition-amr-parser/DATA/OUTPUT/test_output_1.amr --no-isi

smatch.py -f /app/transition-amr-parser/DATA/OUTPUT/test_output_1.amr /app/transition-amr-parser/DATA/AMR2.0/corpora/test.txt

# F-score: 0.40

examples from text files

input_file

line 8-9 from

input_file="/app/transition-amr-parser/DATA/AMR2.0/DATA/AMR2.0/corpora/test_sentence.txt"
How Long are We Going to Tolerate Japan ?
My fellow citizens :

output_file

corresponding prediction from input_file

/app/transition-amr-parser/DATA/OUTPUT/test_output_1.amr 
# ::tok How Long are We Going to Tolerate Japan ?
# ::alignments a~0 w~3 t~6 c~7 n~7 0~7
(t / tolerate-01
    :ARG0 (w / we)
    :ARG1 (c / country
        :name (n / name
            :op1 "Japan"))
    :duration (a / amr-unknown))

# ::tok My fellow citizens :
# ::alignments i~0 f~1 c~2
(c / citizen
    :mod (f / fellow)
    :poss (i / i))
xsthunder commented 1 year ago

We also encounter some warnings during amr-parse as show in the following. A detailed logs from terminal can be found in the attached file 20221207.apptransition-amr-parser.txt.

...
/app/transition-amr-parser/fairseq_ext/sequence_generator.py:1058: UserWarning: An output with one ormore elements was resized since it had shape [512, 1], which does not match the required output shape[510, 1].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zeroelements with t.resize_(0). (Triggered internally at  /opt/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:16.)
  torch.gather(
/app/transition-amr-parser/fairseq_ext/sequence_generator.py:925: UserWarning: An output with one or more elements was resized since it had shape [2], which does not match the required output shape [4].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at  /opt/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:16.)
...
WARNING: disconnected graphs
WARNING: disconnected graphs
...

Thank you for your time Best wishes

ramon-astudillo commented 1 year ago

The warnings are because the parser sometimes is unable to attach some subgraphs, but our best systems still throw these errors.

You need indeed --no-isi or remove the ISI alignment notation as in here.

You will also need the BLINK cache as explained in the README.m and given here. However that should have a minor effect on performance.

The examples that you show seem ok, they are tokenized already and you do not use --tokenize and the AMR seems perfectly fine.

I would use a higher re-start for Smatch, see here but that also does not justify those numbers.

Such a low score should be clearly visible from looking at the AMR files, however. Can you compare with gold and see if you find a systematic pattern?.

xsthunder commented 1 year ago

Thanks a lot. succeed in reproduction.