2024-01-03 21:14:47 INFO initialize network
2024-01-03 21:14:47 INFO create new checkpoint
2024-01-03 21:14:47 INFO removed incomplete checkpoint .ckpt
2024-01-03 21:14:47 INFO checkpoint: .ckpt
2024-01-03 21:14:47 INFO - [arg] dataset: dataset/mitre
2024-01-03 21:14:47 INFO - [arg] transformers_model: xlm-roberta-base
2024-01-03 21:14:47 INFO - [arg] random_seed: 1
2024-01-03 21:14:47 INFO - [arg] lr: 5e-06
2024-01-03 21:14:47 INFO - [arg] epochs: 20
2024-01-03 21:14:47 INFO - [arg] warmup_step: 0
2024-01-03 21:14:47 INFO - [arg] weight_decay: 1e-07
2024-01-03 21:14:47 INFO - [arg] batch_size: 32
2024-01-03 21:14:47 INFO - [arg] max_seq_length: 128
2024-01-03 21:14:47 INFO - [arg] fp16: False
2024-01-03 21:14:47 INFO - [arg] max_grad_norm: 1
2024-01-03 21:14:47 INFO - [arg] lower_case: False
2024-01-03 21:14:47 INFO target dataset: ['dataset/mitre']
2024-01-03 21:14:47 INFO data_name: dataset/mitre
2024-01-03 21:14:47 INFO formatting custom dataset from dataset/mitre
2024-01-03 21:14:47 INFO found following files: {'test': 'test.txt', 'train': 'train.txt', 'valid': 'valid.txt'}
2024-01-03 21:14:47 INFO note that files should be named as either valid.txt, test.txt, or train.txt
Traceback (most recent call last):
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\c3.py", line 11, in
model.train()
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\transformers_ner.py", line 52, in train
trainer.train(monitor_validation=True)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 292, in train
self.__setup_model_data(self.args.dataset, self.args.lower_case)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 142, in __setup_model_data
self.dataset_split, self.label_to_id, self.language, self.unseen_entity_set = get_dataset_ner(
^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 153, in get_dataset_ner
data_split_all, label_to_id, language, ues = get_dataset_ner_single(d, **param)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 359, in get_dataset_ner_single
data_split_all, unseen_entity_set, label_to_id = decode_all_files(
^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 459, in decode_all_files
label_to_id, unseen_entity_set, data_dict = decode_file(
^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 397, in decode_file
for n, line in enumerate(f):
File "C:\Users\talia\anaconda3\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7701: character maps to
got this error:
2024-01-03 21:14:47 INFO initialize network 2024-01-03 21:14:47 INFO create new checkpoint 2024-01-03 21:14:47 INFO removed incomplete checkpoint .ckpt 2024-01-03 21:14:47 INFO checkpoint: .ckpt 2024-01-03 21:14:47 INFO - [arg] dataset: dataset/mitre 2024-01-03 21:14:47 INFO - [arg] transformers_model: xlm-roberta-base 2024-01-03 21:14:47 INFO - [arg] random_seed: 1 2024-01-03 21:14:47 INFO - [arg] lr: 5e-06 2024-01-03 21:14:47 INFO - [arg] epochs: 20 2024-01-03 21:14:47 INFO - [arg] warmup_step: 0 2024-01-03 21:14:47 INFO - [arg] weight_decay: 1e-07 2024-01-03 21:14:47 INFO - [arg] batch_size: 32 2024-01-03 21:14:47 INFO - [arg] max_seq_length: 128 2024-01-03 21:14:47 INFO - [arg] fp16: False 2024-01-03 21:14:47 INFO - [arg] max_grad_norm: 1 2024-01-03 21:14:47 INFO - [arg] lower_case: False 2024-01-03 21:14:47 INFO target dataset: ['dataset/mitre'] 2024-01-03 21:14:47 INFO data_name: dataset/mitre 2024-01-03 21:14:47 INFO formatting custom dataset from dataset/mitre 2024-01-03 21:14:47 INFO found following files: {'test': 'test.txt', 'train': 'train.txt', 'valid': 'valid.txt'} 2024-01-03 21:14:47 INFO note that files should be named as either
model.train()
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\transformers_ner.py", line 52, in train
trainer.train(monitor_validation=True)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 292, in train
self.__setup_model_data(self.args.dataset, self.args.lower_case)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 142, in __setup_model_data
self.dataset_split, self.label_to_id, self.language, self.unseen_entity_set = get_dataset_ner(
^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 153, in get_dataset_ner
data_split_all, label_to_id, language, ues = get_dataset_ner_single(d, **param)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 359, in get_dataset_ner_single
data_split_all, unseen_entity_set, label_to_id = decode_all_files(
^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 459, in decode_all_files
label_to_id, unseen_entity_set, data_dict = decode_file(
^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 397, in decode_file
for n, line in enumerate(f):
File "C:\Users\talia\anaconda3\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7701: character maps to
valid.txt
,test.txt
, ortrain.txt
Traceback (most recent call last): File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\c3.py", line 11, in