Akeepers / LEAR

The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".
113 stars 13 forks source link

可以参考一下您进行ner时候传的参数吗 #2

Closed zlkzzz closed 2 years ago

zlkzzz commented 2 years ago

可以参考您进行ner时候传的参数吗 python run_ner.py --task_type sequence_classification --task_save_name FLAT_NER --data_dir ./data/data/ner --data_name zh_msra --model_name bert_ner --model_name_or_path bert-base-cased --output_dir ./model --do_lower_case False --result_dir ./model/result --first_label_file ./data/data/ner/zh_msra/processed/label_map.json --overwrite_output_dir TRUE --train_set ./data/data/ner/zh_msra/processed/train.json --dev_set ./data/data/ner/zh_msra/processed/dev.json --test_set ./data/data/ner/zh_msra/processed/test.json

Akeepers commented 2 years ago

你可以看下论文附录

zlkzzz commented 2 years ago

Traceback (most recent call last): File "run_ner.py", line 677, in main(args) File "run_ner.py", line 640, in main train(args, model, tokenizer, processor) File "run_ner.py", line 303, in train train_dataset = load_and_cache_examples(args, data_type="train", processor=processor, input_file=args.train_set) File "run_ner.py", line 121, in load_and_cache_examples results = processor.convert_examples_to_feature(input_file, data_type) File "LEAR-master/data_loader.py", line 1876, in convert_examples_to_feature results = self.encode_labels(example['entities'], seq_len, offset_dict, tokens) File "/LEAR-master/data_loader.py", line 1800, in encode_labels "").lower() == label['text'].lower().replace(" ", ""), "[error] {}\n{}\n".format(''.join(tokens[start_idx:end_idx+1]).replace("##", "").lower(), label['text'].lower().replace(" ", "")) AssertionError: [error] [unk][unk] 故宫

请问这是传入数据集不对的问题吗

Akeepers commented 2 years ago

你用我给的数据,应该是不会的,你可以自己debug一下,或者等我空了的时候看一下,可能得周末了

zlkzzz commented 2 years ago

debug了一下,可以跑通了,多谢!

Senwang98 commented 2 years ago

@zlkzzz 请问这边你还记得是什么问题吗?