Reproduce AMR2Text results

goodbai-nlp commented 3 years ago

Thanks for your nice work! I met a few questions when trying to reproduce the AMR2Text results on AMR2.0.

I tried to run the following command using the default config (DFS)

python bin/train.py --config configs/config.yaml --direction text

but got a BLEU score of 41.78, which is lower than the result (45.3) reported in your paper.

I also tried to do predict using released checkpoint AMR2.generation.pt as following:

python bin/predict_sentences.py \
    --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \
    --gold-path data/tmp/amr2.0/gold.text.txt \
    --pred-path data/tmp/amr2.0/pred.text.txt \
    --checkpoint runs/AMR2.generation.pt \
    --beam-size 5 \
    --batch-size 500 \
    --device cuda \
    --penman-linearization --use-pointer-tokens

but only got a BLEU score of 42.3.

I have no idea what is going wrong, could anyone give me some suggestions?
My virtual environment is available at here.

mbevila commented 3 years ago

Hi,

You need to use the JAMR tokenizer (https://github.com/redpony/cdec/blob/master/corpus/tokenize-anything.sh) to tokenize both outputs and gold sentences. Then you use the scorer we provide (https://github.com/SapienzaNLP/spring/blob/main/bin/eval_bleu.py). Sorry it is not so straightforward, but this was done in order to ensure comparability with previous approaches.

goodbai-nlp commented 3 years ago

Thx, I get a BLEU of 45.1 now.

SapienzaNLP / spring

Reproduce AMR2Text results #6