Alibaba-NLP / ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Other
296 stars 44 forks source link

'FastSequenceTagger' object has no attribute 'selection' #11

Closed Aatlantise closed 3 years ago

Aatlantise commented 3 years ago

Hello,

Thank you for your previous help.

I took your advice and am working with the config files in an attempt to reproduce your results.

When I run CUDA_VISIBLE_DEVICES=0 python3 train.py --config config/conll_03_english.yaml --test, I run into Pdb instances (but that's not a problem), and eventually get an AttributeError:

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1358, in final_test
    for name, module in self.model.named_modules():
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'FastSequenceTagger' object has no attribute 'selection'

And it looks like self.selection initialization has been commented out in https://github.com/Alibaba-NLP/ACE/blob/main/flair/models/sequence_tagger_model.py

I get the same error when I attempt to confirm performance on chunking.

Your help is greatly appreciated.

Aatlantise commented 3 years ago

I was able to run the NER training code and replicate the reported results. My previous error was due to system's mistaking of .train.index files as .train files. I removed the .train.index files and the code works.

However, I am still having difficulty training a chunking model.

It turns out that the Pdb instance being activated is part of the problem. I'll look more into this.

For reference, my config is:

Controller:
  model_structure: null
ReinforcementTrainer:
  controller_learning_rate: 0.1
  controller_optimizer: SGD
  distill_mode: false
  optimizer: SGD
  sentence_level_batch: true
embeddings:
  TransformerWordEmbeddings-1:
    model: bert-base-cased
    layers: -1,-2,-3,-4
    pooling_operation: mean
    embedding_name: /root/.cache/torch/transformers/bert-base-cased
  TransformerWordEmbeddings-2:
    model: bert-base-multilingual-cased
    layers: -1,-2,-3,-4
    pooling_operation: mean
  ELMoEmbeddings-0:
    model: original
    # options_file: elmo_2x4096_512_2048cnn_2xhighway_options.json
    # weight_file: elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
  FastCharacterEmbeddings:
    char_embedding_dim: 25
    hidden_size_char: 25
  FastWordEmbeddings-0:
    embeddings: glove
    freeze: true
  FastWordEmbeddings-1:
    embeddings: en
    freeze: true
  FlairEmbeddings-0:
    model: en-forward
  FlairEmbeddings-1:
    model: en-backward
  FlairEmbeddings-2:
    model: multi-forward
  FlairEmbeddings-3:
    model: multi-backward
  TransformerWordEmbeddings-0:
    layers: '-1'
    pooling_operation: first
    model: xlm-roberta-large-finetuned-conll03-english
    embedding_name: /root/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english
interpolation: 0.5
model:
  FastSequenceTagger:
    crf_attention: false
    dropout: 0.0
    hidden_size: 800
    sentence_loss: true
    use_crf: true
model_name: baseline-chunk
chunk:
  Corpus: CONLL_03
  tag_dictionary: resources/taggers/np_tags.pkl
target_dir: resources/taggers/
targets: chunk
train:
  controller_momentum: 0.9
  discount: 0.5
  learning_rate: 0.1
  max_episodes: 5
  max_epochs: 100
  max_epochs_without_improvement: 25
  mini_batch_size: 32
  monitor_test: false
  patience: 5
  save_final_model: true
  train_with_dev: false

And the error seems to occur somewhere between config_parser.py's get_corpus() and data.py's convert_tag_scheme().

Much thanks!

wangxinyu0922 commented 3 years ago

Hi,

I'm very sorry for my late reply due to a certain deadline.

I rerun the command CUDA_VISIBLE_DEVICES=0 python3 train.py --config config/conll_03_english.yaml --test in the server and do not find any problem. In general, the self.model.selection should be assigned when loading the controller.pt. It seems that you have solved this problem.

My config for chunking is something like this:

Controller:
  model_structure: null
ReinforcementTrainer:
  controller_learning_rate: 0.5
  controller_optimizer: SGD
  distill_mode: false
  optimizer: SGD
anneal_factor: 2
chunk:
  Corpus: CONLL_03
embeddings:
  BertEmbeddings-1:
    bert_model_or_path: bert-base-cased
    layers: '-1'
    pooling_operation: mean
  BertEmbeddings-2:
    bert_model_or_path: bert-base-multilingual-cased
    layers: '-1'
    pooling_operation: mean
  ELMoEmbeddings-0:
    options_file: /home/yongjiang.jy/.flair/embeddings/elmo_2x4096_512_2048cnn_2xhighway_options.json
    weight_file: /home/yongjiang.jy/.flair/embeddings/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
  FastCharacterEmbeddings:
    char_embedding_dim: 25
    hidden_size_char: 25
  FastWordEmbeddings-0:
    embeddings: en
    freeze: true
  FlairEmbeddings-0:
    model: en-forward
  FlairEmbeddings-1:
    model: en-backward
  FlairEmbeddings-2:
    model: multi-forward
  FlairEmbeddings-3:
    model: multi-backward
  XLMRoBERTaEmbeddings-0:
    layers: '-1'
    pooling_operation: mean
    pretrained_model_name_or_path: xlm-roberta-large
interpolation: 0.5
is_teacher_list: true
model:
  FastSequenceTagger:
    crf_attention: false
    dropout: 0.0
    hidden_size: 256
    relearn_embeddings: true
    sentence_loss: true
    use_crf: true
model_name: xlmr_en-elmo_en-bert_multi-bert_word_origflair_mflair_char_30episode_300epoch_2000batch_0.1lr_0.5ctrlr_256hidden_en_monolingual_crf_fast_sqrtreward_reinforce_freeze_relearn_0.5discount_10patience_nodev_chunk3
target_dir: resources/taggers/
targets: chunk
teacher_annealing: false
train:
  controller_momentum: 0.9
  discount: 0.5
  learning_rate: 0.1
  max_episodes: 30
  max_epochs: 300
  max_epochs_without_improvement: 25
  mini_batch_size: 2000
  monitor_test: false
  patience: 10
  save_final_model: false
  sqrt_reward: true
  train_with_dev: false
  true_reshuffle: false
trainer: ReinforcementTrainer
Aatlantise commented 3 years ago

Thank you for providing the config file. I hope your due project went well.

Your config file seems to produce the same errors. I dug through a bit and found that the error occurs at converting IOB2 to IOBES because of an invalid IOB format:

root@3fbc1c9e0850:/workspace/ACE# CUDA_VISIBLE_DEVICES=0 python3 train.py --config config/conll_03_chunk_xinyu.yaml
/workspace/ACE/flair/utils/params.py:104: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  dict_merge.dict_merge(params_dict, yaml.load(f))
2021-05-25 06:53:00,393 Reading data from /root/.flair/datasets/conll_03
2021-05-25 06:53:00,393 Train: /root/.flair/datasets/conll_03/eng.train
2021-05-25 06:53:00,393 Dev: /root/.flair/datasets/conll_03/eng.testa
2021-05-25 06:53:00,393 Test: /root/.flair/datasets/conll_03/eng.testb
> /workspace/ACE/flair/data.py(644)convert_tag_scheme()
-> for index, tag in enumerate(tags):
(Pdb) tag_type
'chunk'
(Pdb) target_scheme
'iobes'
(Pdb) token
Token: 1 -DOCSTART-
(Pdb) u
> /workspace/ACE/flair/datasets.py(919)__init__()
-> tag_type=self.tag_to_bioes, target_scheme="iobes"
(Pdb) u
> /workspace/ACE/flair/datasets.py(92)__init__()
-> in_memory=in_memory,
(Pdb) u
> /workspace/ACE/flair/datasets.py(1678)__init__()
-> data_folder, columns, tag_to_bioes=tag_to_bioes, in_memory=in_memory
(Pdb) data_folder
PosixPath('/root/.flair/datasets/conll_03')
(Pdb) columns
{0: 'text', 1: 'pos', 2: 'chunk', 3: 'ner'}
(Pdb)

Perhaps it's the dataset format that's different? But I don't think so because I was able to run train.py --config config/conll_03_english.yaml

I've also tried removing the header with the X tag, and implemented your solution to the previous absolute path issue, but to no avail.

My last question regards the CoNLL2000 dataset. In your paper, you report chunking results on the CoNLL2000 dataset, but it seems that ACE does not process CoNLL2000 dataset, and you used the 03 dataset in your config?

Any help would be appreciated!

wangxinyu0922 commented 3 years ago

The possible reason is that the conll_03 files do not follow the BIO format in chunking, you may check the file. From your log, it seems that the word -DOCSTART- do not follow the tagging. You may replace the chunk tag of -DOCSTART- with O or simply remove the -DOCSTART- (but -DOCSTART- is used in document-level NER). For CoNLL 2000, I will update the guide for it later, but currently, you may write the corpus settings following this part in README.md

Aatlantise commented 3 years ago

Thank you for your help. I was able to replicate your experiments, with CoNLL2000 performances of F1 97.0 for ACE and F1 97.3 for ACE+finetuning.

I think this will be my final question. During training I get Overall best score: 97.28999999999999, which matches the reported performance in your paper. However, when I run train.py --test, I get MICRO_AVG: acc 0.9371 - f1-score 0.9675.

I think this means that the model evaluation is not being done on the best model, but rather the final model?

How can I get the detailed performance stats of my best model, and ultimately use the best model for inference/prediction?

Much thanks!!

wangxinyu0922 commented 3 years ago

Hi, The Overall best score is just the F1 score of the development set. I'm not sure why running train.py --test uses the final-model.pt since the code should use best-model.pt first.

To reproduce the CoNLL 2000 performance, you may use the embedding setting like this (you need to train finetuned embeddings and modify the path to the embedding) :

Controller:
  model_structure: null
ReinforcementTrainer:
  controller_learning_rate: 0.1
  controller_optimizer: SGD
  distill_mode: false
  optimizer: SGD
  sentence_level_batch: true
anneal_factor: 2
chunk:
  Corpus: CONLL_2000
embeddings:
  ELMoEmbeddings-0:
    options_file: /home/yongjiang.jy/.flair/embeddings/elmo_2x4096_512_2048cnn_2xhighway_options.json
    weight_file: /home/yongjiang.jy/.flair/embeddings/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
  FastCharacterEmbeddings:
    char_embedding_dim: 25
    hidden_size_char: 25
  FastWordEmbeddings-0:
    embeddings: extvec
    freeze: true
  FlairEmbeddings-0:
    model: en-forward
  FlairEmbeddings-1:
    model: en-backward
  FlairEmbeddings-2:
    model: multi-forward
  FlairEmbeddings-3:
    model: multi-backward
  TransformerWordEmbeddings-0:
    layers: '-1'
    model: /home/yongjiang.jy/.flair/embeddings/xlnet-first_10epoch_1batch_4accumulate_0.000005lr_10000lrrate_conll00_monolingual_nocrf_fast_norelearn_sentbatch_sentloss_finetune_saving_nodev_chunk5/xlnet-large-cased
    pooling_operation: first
  TransformerWordEmbeddings-1:
    layers: '-1'
    model: /home/yongjiang.jy/.flair/embeddings/en-xlmr-first_10epoch_1batch_4accumulate_0.000005lr_10000lrrate_conll00_monolingual_nocrf_fast_norelearn_sentbatch_sentloss_finetune_saving_nodev_chunk3/roberta-large
    pooling_operation: first
  TransformerWordEmbeddings-2:
    layers: '-1'
    model: /home/yongjiang.jy/.flair/embeddings/xlmr_10epoch_8batch_0.000005lr_10000lrrate_conll00_monolingual_nocrf_fast_relearn_sentbatch_sentloss_finetune_saving_newver_nodev_chunk5/xlm-roberta-large
    pooling_operation: first
  TransformerWordEmbeddings-3:
    layers: -1,-2,-3,-4
    model: bert-base-cased
    pooling_operation: first
  TransformerWordEmbeddings-4:
    layers: -1,-2,-3,-4
    model: bert-base-multilingual-cased
    pooling_operation: first
interpolation: 0.5
is_teacher_list: true
model:
  FastSequenceTagger:
    crf_attention: false
    dropout: 0.0
    hidden_size: 800
    sentence_loss: true
    use_crf: true
    word_dropout: 0.0
model_name: xlnet-task_en-xlmr-task_xlmr-task_elmo_bert-four_multi-bert-four_word-extvec_origflair_mflair_char_30episode_150epoch_32batch_0.1lr_800hidden_conll00_monolingual_crf_fast_reinforce_freeze_sentbatch_0.5discount_5patience_nodev_chunk8
target_dir: resources/taggers/
targets: chunk
teacher_annealing: false
train:
  controller_momentum: 0.9
  discount: 0.5
  learning_rate: 0.1
  max_episodes: 30
  max_epochs: 150
  max_epochs_without_improvement: 25
  mini_batch_size: 32
  monitor_test: false
  patience: 5
  save_final_model: false
  train_with_dev: false
  true_reshuffle: false
  use_warmup: false
trainer: ReinforcementTrainer
Aatlantise commented 3 years ago

Ah. I see. I misunderstood the meaning of the logs then. The additional info is very helpful too--I'll use the finetuned embeddings for XLNet, Roberta, and XLM-Roberta, and see what I get. I simply used finetuned BERT's embeddings for my model.

Much thanks again!

Aatlantise commented 3 years ago

I was able to finetune XLNet and XLM-Roberta. However I am having trouble fine-tuning RoBERTa. I noticed that class RoBERTaEmbeddings in embeddings.py doesn't have a self.fine_tune parameter. I can try without the RoBERTa embedding in the meantime, but is there another way to fine-tune RoBERTa?

Thank you!

wangxinyu0922 commented 3 years ago

I use class TransformerWordEmbeddings to finetune all transformer-based embeddings. Note that I use a learning rate of 5e-6 for XLNet, XLMR and Roberta.

Aatlantise commented 3 years ago

I'm able to train ACE using the fine-tuned embeddings. Not quite 97.3, but rounded up to 97.2, which is still great.

Thanks very much for your thorough help throughout :)

Best wishes!

gxc199912 commented 1 month ago

Hi, I have the same problem.When I run CUDA_VISIBLE_DEVICES=0 python3 train.py --config config/conll_03_english.yaml --test, I get an AttributeError(I un into Pdb instances): Traceback (most recent call last): File "/data_share/gxc_data/ACE/train.py", line 157, in trainer.final_test( File "/data_share/gxc_data/ACE/flair/trainers/reinforcement_trainer.py", line 1444, in final_test for name, module in self.model.named_modules(): File "/data_share/gxc_data/miniconda3/envs/ace/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1207, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'FastSequenceTagger' object has no attribute 'selection'

But I have no way to solve this problem.