IBM / transition-amr-parser

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.
Apache License 2.0
245 stars 48 forks source link

AssertionError: current action not in the allowed space? check the rules. #27

Closed goodbai-nlp closed 2 years ago

goodbai-nlp commented 2 years ago

Hi, I'm trying to train the action-transformer model on the AMR2.0 dataset. I follow the README to run bash run/run_experiment.sh configs/amr2.0-action-pointer.sh, but get the following assertion error, what should I do?

Oracle: 36521it [18:55, 32.16it/s]
Base actions:
Counter({'PRED': 7943, 'ENTITY': 717, 'RA': 142, 'LA': 121, 'COPY_SENSE01': 1, 'SHIFT': 1, 'REDUCE': 1, 'COPY_LEMMA': 1, 'MERGE': 1})
Most frequent actions:
[('SHIFT', 417623), ('REDUCE', 256155), ('COPY_LEMMA', 121546), ('COPY_SENSE01', 66826), ('RA(:ARG1)', 59949), ('LA(:ARG0)', 49124), ('LA(root)', 36384), ('LA(:mod)', 32481), ('LA(:ARG1)', 30126), ('RA(:ARG2)', 25345)]
3876 singleton actions
Counter({'PRED': 3504, 'ENTITY': 345, 'LA': 19, 'RA': 8})
Reading DATA/AMR2.0/aligned/cofill//dev.txt
1368 sentences
3875/23797 node types/tokens
109/24019 edge types/tokens
5385/29269 word types/tokens
39/1368 2.9 % repeated sents (max 9 times)
6/1368 0.0044 % inconsistent labelings from repeated sents
Oracle: 1368it [00:44, 30.96it/s]
Base actions:
Counter({'PRED': 1335, 'ENTITY': 135, 'RA': 102, 'LA': 70, 'SHIFT': 1, 'REDUCE': 1, 'COPY_LEMMA': 1, 'COPY_SENSE01': 1, 'MERGE': 1})
Most frequent actions:
[('SHIFT', 18483), ('REDUCE', 10934), ('COPY_LEMMA', 5746), ('COPY_SENSE01', 3236), ('RA(:ARG1)', 2701), ('LA(:ARG0)', 2073), ('LA(:mod)', 1623), ('LA(root)', 1366), ('LA(:ARG1)', 1351), ('MERGE', 1220)]
792 singleton actions
Counter({'PRED': 704, 'ENTITY': 56, 'RA': 20, 'LA': 12})
Reading DATA/AMR2.0/aligned/cofill//test.txt
1371 sentences
3897/24451 node types/tokens
112/25113 edge types/tokens
5364/30054 word types/tokens
20/1371 1.5 % repeated sents (max 7 times)
1/1371 0.0007 % inconsistent labelings from repeated sents
Oracle: 1371it [00:45, 30.11it/s]
Base actions:
Counter({'PRED': 1365, 'ENTITY': 127, 'RA': 103, 'LA': 74, 'SHIFT': 1, 'COPY_SENSE01': 1, 'COPY_LEMMA': 1, 'REDUCE': 1, 'MERGE': 1})
Most frequent actions:
[('SHIFT', 18874), ('REDUCE', 11123), ('COPY_LEMMA', 5787), ('COPY_SENSE01', 3227), ('RA(:ARG1)', 2813), ('LA(:ARG0)', 2205), ('LA(:mod)', 1661), ('MERGE', 1428), ('LA(:ARG1)', 1423), ('LA(root)', 1357)]
829 singleton actions
Counter({'PRED': 745, 'ENTITY': 51, 'LA': 17, 'RA': 16})
[Preprocessing data:]
[Configuration file:]
configs/amr2.0-action-pointer.sh
Cleaning up partially completed DATA/AMR2.0/features/cofill_o8.3_act-states_RoBERTa-large-top24//
Namespace(alignfile=None, batch_normalize_reward=False, bert_layers=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='DATA/AMR2.0/features/cofill_o8.3_act-states_RoBERTa-large-top24//', embdir='DATA/AMR2.0/embeddings/RoBERTa-large-top24', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gold_annotations=None, gold_episode_ratio=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', machine_rules=None, machine_type=None, memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer='nag', padding_factor=8, pretrained_embed='roberta.large', seed=1, source_lang='en', srcdict=None, target_lang='actions', task='amr_action_pointer_graphmp', tbmf_wrapper=False, tensorboard_logdir='', testpref='DATA/AMR2.0/oracles/cofill_o8.3_act-states//test', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, trainpref='DATA/AMR2.0/oracles/cofill_o8.3_act-states//train', user_dir='../fairseq_ext', validpref='DATA/AMR2.0/oracles/cofill_o8.3_act-states//dev', workers=1)
| [en] Dictionary: 33263 types
| [en] DATA/AMR2.0/oracles/cofill_o8.3_act-states//train.en: 36521 sents, 689426 tokens, 0.0% replaced by <unk>
| [en] Dictionary: 33263 types
| [en] DATA/AMR2.0/oracles/cofill_o8.3_act-states//dev.en: 1368 sents, 30637 tokens, 4.13% replaced by <unk>
| [en] Dictionary: 33263 types
| [en] DATA/AMR2.0/oracles/cofill_o8.3_act-states//test.en: 1371 sents, 31425 tokens, 3.74% replaced by <unk>
----------------------------------------------------------------------------------------------------
Generate and process action states information (number of workers: 1):
[English sentence file: DATA/AMR2.0/oracles/cofill_o8.3_act-states//train.en]
[AMR actions file: DATA/AMR2.0/oracles/cofill_o8.3_act-states//train.actions]
 processed 2000 en-actions pairs (time: 4m 41s)Traceback (most recent call last):
  File "fairseq_ext/preprocess_graphmp.py", line 313, in <module>
    cli_main()
  File "fairseq_ext/preprocess_graphmp.py", line 309, in cli_main
    main(args)
  File "fairseq_ext/preprocess_graphmp.py", line 269, in main
    task_obj.build_actions_states_info(en_file, actions_file, out_file_pref, num_workers=args.workers)
  File "/mnt/nfs-storage/transition-amr-parser/fairseq_ext/tasks/amr_action_pointer_graphmp.py", line 365, in build_actions_states_info
    impl='mmap', tokenize=self.tokenize, num_workers=num_workers)
  File "/mnt/nfs-storage/transition-amr-parser/fairseq_ext/amr_spec/action_info_binarize_graphmp.py", line 410, in binarize_actstates_tofile_workers
    actions_offset=0, actions_end=actions_offsets[1])
  File "/mnt/nfs-storage/transition-amr-parser/fairseq_ext/amr_spec/action_info_binarize_graphmp.py", line 114, in binarize
    actions_states = get_actions_states(tokens=tokenize(line), actions=tokenize(actions))
  File "/mnt/nfs-storage/transition-amr-parser/fairseq_ext/amr_spec/action_info_graphmp.py", line 57, in get_actions_states
    assert cano_act in act_allowed, 'current action not in the allowed space? check the rules.'
AssertionError: current action not in the allowed space? check the rules.
goodbai-nlp commented 2 years ago

Fixed by using standard AMR files.