allanj / pytorch_neural_crf

Pytorch implementation of LSTM/BERT-CRF for named entity recognition
359 stars 62 forks source link

Please help me? #36

Closed Deerzh closed 2 years ago

Deerzh commented 2 years ago

Q1: Can you tell me how to set the appropriate {YOUR_OTHER_ARGUMENTS} in this command: accelerate launch transformers_trainer_ddp.py --batch_size=30 {YOUR_OTHER_ARGUMENTS}.

Q2:when I run this command: python trainer.py --embedder_type=bert-large-cased, an error occurred: Traceback (most recent call last): File "trainer.py", line 12, in from src.config import context_models, get_metric ImportError: cannot import name 'context_models' from 'src.config' (/home/zhang/compatibility_analysis/pytorch_neural_crf/src/config/init.py)

Can you help me fix this issue?

allanj commented 2 years ago
  1. {YOUR_OTHER_ARGUMENTS} can be left empty. Or you can refer to all the arguments here: https://github.com/allanj/pytorch_neural_crf/blob/master/transformers_trainer.py#L29-L61
  2. Please try to pull the latest version. It is fixed now.
Deerzh commented 2 years ago

I update the code,but errors still exist. Error1. when I run this command:python trainer.py --embedder_type=bert-large-cased error like this : usage: trainer.py [-h] [--device {cpu,cuda:0,cuda:1,cuda:2}] [--seed SEED] [--dataset DATASET] [--embedding_file EMBEDDING_FILE] [--embedding_dim EMBEDDING_DIM] [--optimizer OPTIMIZER] [--learning_rate LEARNING_RATE] [--l2 L2] [--lr_decay LR_DECAY] [--batch_size BATCH_SIZE] [--num_epochs NUM_EPOCHS] [--train_num TRAIN_NUM] [--dev_num DEV_NUM] [--test_num TEST_NUM] [--max_no_incre MAX_NO_INCRE] [--model_folder MODEL_FOLDER] [--hidden_dim HIDDEN_DIM] [--dropout DROPOUT] [--use_char_rnn {0,1}] [--static_context_emb {none,elmo}] [--add_iobes_constraint {0,1}] trainer.py: error: unrecognized arguments: --embedder_type=bert-large-cased

Error: if I left this {YOUR_OTHER_ARGUMENTS} empty, error still occurred : Traceback (most recent call last): File "transformers_trainer_ddp.py", line 22, in import datasets ModuleNotFoundError: No module named 'datasets' Traceback (most recent call last): File "/home/zhang/anaconda3/envs/neural/bin/accelerate", line 8, in sys.exit(main()) File "/home/zhang/anaconda3/envs/neural/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/home/zhang/anaconda3/envs/neural/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/home/zhang/anaconda3/envs/neural/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/zhang/anaconda3/envs/neural/bin/python', 'transformers_trainer_ddp.py', '--batch_size=30']' returned non-zero exit status 1.

allanj commented 2 years ago

Following the README, you should run transformer_trainer rather than trainer.py

allanj commented 2 years ago

For the second one.. you need to

pip install datasets

I just updated the README to include that. Thanks

Deerzh commented 2 years ago

Following the README, you should run transformer_trainer rather than trainer.py

Thank you for your reply,but there are still have some questions about this. Q1:Is that I firstly run transformer_trainer and secondly run trainer.py or just run transformer_trainer.py? I don't understand your meaning. Because if I run trainer.py command with '--embedder_type=bert-large-cased' argument,it will raise an error,however if I run trainer.py without arguments, it will be successfully?

Q2 : I have pip install datasets.but when I run accelerate launch transformers_trainer_ddp.py --batch_size=30, error still occurred,like this: The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --num_cpu_threads_per_process was set to 52 to improve out-of-box performance To avoid this warning pass in values for each of the problematic parameters or run accelerate config. 09/02/2022 16:16:35 - INFO - main - seed: 42 09/02/2022 16:16:35 - INFO - main - dataset: conll2003 09/02/2022 16:16:35 - INFO - main - optimizer: adamw 09/02/2022 16:16:35 - INFO - main - learning_rate: 2e-05 09/02/2022 16:16:35 - INFO - main - momentum: 0.0 09/02/2022 16:16:35 - INFO - main - l2: 1e-08 09/02/2022 16:16:35 - INFO - main - lr_decay: 0 09/02/2022 16:16:35 - INFO - main - batch_size: 30 09/02/2022 16:16:35 - INFO - main - num_epochs: 1 09/02/2022 16:16:35 - INFO - main - train_num: -1 09/02/2022 16:16:35 - INFO - main - dev_num: -1 09/02/2022 16:16:35 - INFO - main - test_num: -1 09/02/2022 16:16:35 - INFO - main - max_no_incre: 80 09/02/2022 16:16:35 - INFO - main - max_grad_norm: 1.0 09/02/2022 16:16:35 - INFO - main - fp16: 1 09/02/2022 16:16:35 - INFO - main - model_folder: english_model 09/02/2022 16:16:35 - INFO - main - hidden_dim: 0 09/02/2022 16:16:35 - INFO - main - dropout: 0.5 09/02/2022 16:16:35 - INFO - main - embedder_type: roberta-base 09/02/2022 16:16:35 - INFO - main - add_iobes_constraint: 0 09/02/2022 16:16:35 - INFO - main - print_detail_f1: 0 09/02/2022 16:16:35 - INFO - main - earlystop_atr: micro 09/02/2022 16:16:35 - INFO - main - mode: train 09/02/2022 16:16:35 - INFO - main - test_file: data/conll2003/test.txt Downloading builder script: 6.33kB [00:00, 2.49MB/s]
09/02/2022 16:16:45 - INFO - main - [Data Info] Tokenizing the instances using 'roberta-base' tokenizer 09/02/2022 16:16:55 - INFO - main - [Data Info] Reading dataset from: data/conll2003/train.txt data/conll2003/dev.txt data/conll2003/test.txt 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Reading file: data/conll2003/train.txt, labels will be converted to IOBES encoding 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Modify src/data/transformers_dataset.read_txt function if you have other requirements 100%|███████████████████████████████████████████████████████| 300/300 [00:00<00:00, 855980.41it/s] 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - number of sentences: 14 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Using the training set to build label index 09/02/2022 16:16:55 - INFO - src.data.data_utils - #labels: 16 09/02/2022 16:16:55 - INFO - src.data.data_utils - label 2idx: {'': 0, 'O': 1, 'S-ORG': 2, 'S-MISC': 3, 'B-PER': 4, 'E-PER': 5, 'S-LOC': 6, 'B-ORG': 7, 'E-ORG': 8, 'I-PER': 9, 'S-PER': 10, 'B-MISC': 11, 'I-MISC': 12, 'E-MISC': 13, '': 14, '': 15} 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] We are not limiting the max length in tokenizer. You should be aware of that 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Reading file: data/conll2003/dev.txt, labels will be converted to IOBES encoding 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Modify src/data/transformers_dataset.read_txt function if you have other requirements 100%|█████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 213995.10it/s] 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - number of sentences: 2 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] We are not limiting the max length in tokenizer. You should be aware of that 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Reading file: data/conll2003/test.txt, labels will be converted to IOBES encoding 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] Modify src/data/transformers_dataset.read_txt function if you have other requirements 100%|███████████████████████████████████████████████████| 50350/50350 [00:00<00:00, 895523.33it/s] 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - number of sentences: 3684 09/02/2022 16:16:55 - INFO - src.data.transformers_dataset - [Data Info] We are not limiting the max length in tokenizer. You should be aware of that Traceback (most recent call last): File "transformers_trainer_ddp.py", line 284, in main() File "transformers_trainer_ddp.py", line 252, in main test_dataset = TransformersNERDataset(conf.test_file, tokenizer, number=conf.test_num, label2idx=train_dataset.label2idx, is_train=False) File "/home/zhang/compatibility_analysis/pytorch_neural_crf/src/data/transformers_dataset.py", line 94, in init self.insts_ids = convert_instances_to_feature_tensors(insts, tokenizer, label2idx) File "/home/zhang/compatibility_analysis/pytorch_neural_crf/src/data/transformers_dataset.py", line 53, in convert_instances_to_feature_tensors label_ids = [label2idx[label] for label in labels] if labels else [-100] len(words) File "/home/zhang/compatibility_analysis/pytorch_neural_crf/src/data/transformers_dataset.py", line 53, in label_ids = [label2idx[label] for label in labels] if labels else [-100] len(words) KeyError: 'B-LOC' Traceback (most recent call last): File "/home/zhang/anaconda3/envs/neural/bin/accelerate", line 8, in sys.exit(main()) File "/home/zhang/anaconda3/envs/neural/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/home/zhang/anaconda3/envs/neural/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/home/zhang/anaconda3/envs/neural/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/zhang/anaconda3/envs/neural/bin/python', 'transformers_trainer_ddp.py', '--batch_size=30']' returned non-zero exit status 1.

allanj commented 2 years ago

You have a Label 'B-LOC' that does not exist in your training set

allanj commented 2 years ago

feel free to reopen the issue