使用自己数据报错

lijun-1999 commented 10 months ago

请问这个错误应该怎么解决啊？ Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/main.py", line 97, in data = build_data(args) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/utils/data.py", line 47, in build_data data.generate_instance(args, data_process) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/utils/data.py", line 31, in generate_instance self.train_loader = data_process(args.train_file, self.relational_alphabet, tokenizer) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/utils/functions.py", line 77, in data_process tail_start_index, tail_end_index = list_index(tail_token, token_sent) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/utils/functions.py", line 16, in list_index return index[0], index[1] UnboundLocalError: local variable 'index' referenced before assignment

lijun-1999 commented 10 months ago

使用NYT数据集和WebNLG数据集复现代码是没有报错的，使用自己的数据，按照NYT数据集的格式修改的自己的数据，但是报这个错误

lijun-1999 commented 10 months ago

将list_index相关的代码修改为： def list_index(list1: list, list2: list) -> list: if not list1 or not list2: return -1, -1 # 返回无效的索引值

start = [i for i, x in enumerate(list2) if x == list1[0]]
end = [i for i, x in enumerate(list2) if x == list1[-1]]
index = (-1, -1)  # 初始化索引变量
if len(start) == 1 and len(end) == 1:
    return start[0], end[0]
else:
    for i in start:
        for j in end:
            if i <= j:
                if list2[i:j+1] == list1:
                    index = (i, j)
                    break
return index[0], index[1]

可以解决报错

lijun-1999 commented 10 months ago

但是修改过后，又有新的错误 DATA SUMMARY END. === Epoch 0 train === /pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/main.py", line 102, in trainer.train_model() File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/trainer/trainer.py", line 77, in trainmodel loss, = self.model(input_ids, attention_mask, targets) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/models/setpred4RE.py", line 31, in forward loss = self.criterion(outputs, targets) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/models/set_criterion.py", line 46, in forward losses.update(self.get_loss(loss, outputs, targets, indices)) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/models/set_criterion.py", line 96, in get_loss return loss_map[loss](outputs, targets, indices, kwargs) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/models/set_criterion.py", line 56, in relation_loss target_classeso = torch.cat([t["relation"][i] for t, (, i) in zip(targets, indices)]) File "/root/autodl-tmp/SPN4RE/Nr_Partial_ch_SPN4RE-main/models/set_criterion.py", line 56, in target_classeso = torch.cat([t["relation"][i] for t, (, i) in zip(targets, indices)]) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Yubeiqi commented 6 months ago

我遇到了相同的错误，错误的原因很可能是由于em1Text，em2Text中的文本和sentText中的不匹配，比如大小写不同。或者含有特殊字符也可能导致错误。你可以在./utils/functions.py/data_process函数中的： for i in range(len(lines)): 循环体中打印i的值，来寻找具体是文件中的哪些行有这些问题，修改或者直接删除这些行应该就能解决这个问题。

Yubeiqi commented 6 months ago

顺便分享几个我找到错误数据： {"sentText": "D-penicillamine in the treatment of rheumatoid arthritis .", "relationMentions": [{"em1Text": "[D-penicillamine", "em2Text": "rheumatoid arthritis", "label": "/chemical/disease/other"}]} 这个应该是包含特殊字符的原因，导致错误的切割token。 {"sentText": "Various dependent variables measuring depression showed no significant relapse-preventing effects of fluvoxamine , but only positive trends", "relationMentions": [{"em1Text": "fluvoxamine", "em2Text": "depression", "label": "/chemical/disease/other"}, {"em1Text": "Fluvoxamine", "em2Text": "depression", "label": "/chemical/disease/other"}]} 这个是由于em1Text中的大小写问题。 {"sentText": "Strontium 87mSr bone scanning for the evaluation of total hip replacement.In a series of seventeen patients with unilateral osteoarthritis of the hip a scintiscanning follow-up study was made before and after total hip replacement for the assessment of the normal course of the 87mSr-scintiscan", "relationMentions": [{"em1Text": "mSr", "em2Text": "unilateral osteoarthritis", "label": "/gene/disease/related"}, {"em1Text": "hip", "em2Text": "unilateral osteoarthritis", "label": "/gene/disease/related"}, {"em1Text": "hip", "em2Text": "unilateral osteoarthritis", "label": "/gene/disease/related"}, {"em1Text": "hip", "em2Text": "unilateral osteoarthritis", "label": "/gene/disease/related"}]} 这个是由于关系重复。

DianboWork / SPN4RE

使用自己数据报错 #25