autoliuweijie / K-BERT

Source code of K-BERT (AAAI2020)
https://ojs.aaai.org//index.php/AAAI/article/view/5681
949 stars 212 forks source link

错误求教:在导入文本时代码为什么能直接把数据的label直接用int进行强制类型转换? #46

Closed Jennifer1996 closed 3 years ago

Jennifer1996 commented 3 years ago

Vocabulary file line 344 has bad format token Vocabulary Size: 21128 [BertClassifier] use visible_matrix: True [KnowledgeGraph] Loading spo from /home/schen/K-BERT/brain/kgs/Medical.spo Start training. Loading sentences from ./datasets/medical_ner/train.tsv There are 6919 sentence in total. We use 1 processes to inject knowledge into sentences. {'text_a': 0, 'label': 1} Progress of process 0: 0/6919 ['山 , 男 , 7 3 岁 , 汉 族 , 已 婚 , 现 住 双 滦 区 陈 栅 子 乡 太 阳 沟 村 。', 'O O O O O O O O O O O O O O O O O O O O O O O O O O O O']

Traceback (most recent call last): File "run_kbert_cls.py", line 582, in main() File "run_kbert_cls.py", line 501, in main trainset = read_dataset(args.train_path, workers_num=args.workers_num) File "run_kbert_cls.py", line 329, in read_dataset dataset = add_knowledge_worker(params) File "run_kbert_cls.py", line 84, in add_knowledge_worker label = int(line[columns["label"]]) ValueError: invalid literal for int() with base 10: 'O O O O O O O O O O O O O O O O O O O O O O O O O O O O'