KaijuML / dtt-multi-branch

Code for Controlling Hallucinations at Word Level in Data-to-Text Generation (C. Rebuffel, M. Roberti, L. Soulier, G. Scoutheeten, R. Cancelliere, P. Gallinari)
https://arxiv.org/abs/2102.02810
Other
17 stars 2 forks source link

Unable to successfully run POS-tagging cmd #2

Closed juelap closed 3 years ago

juelap commented 3 years ago

I am trying to run the POS-tagging cmd python3 pos_tagging.py --do_train --do_tagging train --gpus 0 1 --dataset_folder wikibio that is listed in the README of data folder, but doesn't complete successfully. I get the following error:

Using the following devices: [1,2]
Using the following environment variables, please edit the script if needed
CUDA_VISIBLE_DEVICES=1,2
Using the following arguments, please edit the script if needed
--data_dir ./pos --model_type bert --labels ./pos/labels.txt --model_name_or_path bert-base-uncased --output_dir ./pos/trained --max_seq_length 256 --num_train_epochs 3 --per_gpu_train_batch_size 32 --save_steps 750
Traceback (most recent call last):
  File "run_ner.py", line 70, in <module>
    (),
  File "run_ner.py", line 68, in <genexpr>
    for conf in (BertConfig, RobertaConfig, DistilBertConfig, CamembertConfig, XLMRobertaConfig)
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'
Loading examples from train
Traceback (most recent call last):
  File "run_ner.py", line 70, in <module>
    (),
  File "run_ner.py", line 68, in <genexpr>
    for conf in (BertConfig, RobertaConfig, DistilBertConfig, CamembertConfig, XLMRobertaConfig)
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'
Traceback (most recent call last):
  File "pos_tagging.py", line 225, in <module>
    do_tagging(args.pos_folder, args.dataset_folder, args.do_tagging, gpus, args.max_seq_length, args.split_size)
  File "pos_tagging.py", line 163, in do_tagging
    run_script(examples, pos_folder, dest, gpus, max_seq_length)
  File "pos_tagging.py", line 139, in run_script
    open(orig, mode="r", encoding='utf8') as origfile:
FileNotFoundError: [Errno 2] No such file or directory: './pos/trained/test_predictions.txt'

The first type of error is happening when computing the run_ner.py script and I think is related to the version of the transformers package that is used. As I have tested, when the version is => 3.X.X the above error of AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map' is thrown. When a lower version of the package is used, say 2.0.0, the following error ImportError: cannot import name 'CamembertConfig' from 'transformers' (/opt/conda/lib/python3.7/site-packages/transformers/__init__.py) is thrown. Maybe there should be a specific version of the package used where none of the errors is happening.

The second and more important error says that the above ./pos/trained/test_predictions.txt file is not found. Where shall I find this file? Do I need to get it from somewhere else? Thanks in advance :)

KaijuML commented 3 years ago

Hi,

Thanks for reporting this issue!

As you correctly guessed, this is 100% fixed by having the correct versions of the packages:

I have added a requirements.txt file, so that you can install everything at once in your virtual env.

Also, note that you can use python3 format_wikibio.py --first_sentence to train models on only the first sentences (which is the standard case in most paper). I have now added this command to data/README.md.

It seems to be working on my end, so I am closing this issue for now. Don't hesitate to reopen if I have missed anything!

Cheers, Clément