Tongjilibo / bert4torch

An elegent pytorch implement of transformers
https://bert4torch.readthedocs.io/
MIT License
1.22k stars 152 forks source link

GPLinker的中的MyLoss类下出现的问题 #132

Closed liyao345496280 closed 1 year ago

liyao345496280 commented 1 year ago

提问时请尽可能提供如下信息:

基本信息

核心代码

# 请在此处贴上你的核心代码
class MyLoss(SparseMultilabelCategoricalCrossentropy):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def forward(self, y_preds, y_trues):
        ''' y_preds: [Tensor], shape为[btz, heads, seq_len ,seq_len]
        '''
        loss_list = []
        for y_pred, y_true in zip(y_preds, y_trues):
            shape = y_pred.shape
            # 乘以seq_len是因为(i, j)在展开到seq_len*seq_len维度对应的下标是i*seq_len+j
            y_true = y_true[..., 0] * shape[2] + y_true[..., 1]  # [btz, heads, 实体起终点的下标]
            y_pred = y_pred.reshape(shape[0], -1, np.prod(shape[2:]))  # [btz, heads, seq_len*seq_len]
            loss = super().forward(y_pred, y_true.long())
            loss = torch.mean(torch.sum(loss, dim=1))
            loss_list.append(loss)
        return {'loss': sum(loss_list) / 3, 'entity_loss': loss_list[0], 'head_loss': loss_list[1],
                'tail_loss': loss_list[2]}

输出信息

# 请在此处贴上你的调试输出
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: block: [0,0,0], thread: [30,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
  File "D:\LY\RE\test.py", line 264, in <module>
    model.fit(train_dataloader, steps_per_epoch=None, epochs=20, callbacks=[evaluator])
  File "D:\miniconda3\envs\LY\lib\site-packages\torch4keras\model.py", line 230, in fit
    self.output, self.loss, self.loss_detail = self.train_step(self.train_X, self.train_y)
  File "D:\miniconda3\envs\LY\lib\site-packages\torch4keras\model.py", line 106, in train_step
    loss_detail = self.criterion(output, train_y)
  File "D:\miniconda3\envs\LY\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\LY\RE\test.py", line 161, in forward
    loss = super().forward(y_pred, y_true.long())
  File "D:\miniconda3\envs\LY\lib\site-packages\bert4torch\losses.py", line 101, in forward
    pos_loss = torch.logsumexp(-y_pos_1, dim=-1)
RuntimeError: CUDA error: device-side assert triggered

Process finished with exit code 1

自我尝试

我尝试了将y_pred,y_true转换为CPU中再进行运行,结果显示的报错是

Traceback (most recent call last): File "D:\LY\RE\test.py", line 266, in model.fit(train_dataloader, steps_per_epoch=None, epochs=20, callbacks=[evaluator]) File "D:\miniconda3\envs\LY\lib\site-packages\torch4keras\model.py", line 230, in fit self.output, self.loss, self.loss_detail = self.train_step(self.train_X, self.train_y) File "D:\miniconda3\envs\LY\lib\site-packages\torch4keras\model.py", line 106, in train_step loss_detail = self.criterion(output, train_y) File "D:\miniconda3\envs\LY\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "D:\LY\RE\test.py", line 163, in forward loss = super().forward(y_pred, y_true.long()) File "D:\miniconda3\envs\LY\lib\site-packages\bert4torch\losses.py", line 96, in forward y_pos_2 = torch.gather(y_pred, dim=-1, index=y_true) # [..., num_positive] RuntimeError: index -1 is out of bounds for dimension 2 with size 1025

Process finished with exit code 1

Tongjilibo commented 1 year ago

看这个错误一般下标越界,你这个是一上来直接报错,还是能跑几个batch再报错,另外你可以try catch打印一下计算loss错误时候,y_true和y_pred的情况

liyao345496280 commented 1 year ago

这是debug出来的出错地方,在这个SparseMultilabelCategoricalCrossentropy类里面。 K(1B0 1KT24RR%B80C28G@E

这是此时的y_true和y_pred的情况,其中我把batchsize设置为8。 L`M }H%NS%)T@`AWM626E6G

liyao345496280 commented 1 year ago

这应该是一上来就报的错,这个MyLoss中的forward函数第一次调用的时候就报了错

Tongjilibo commented 1 year ago

我看这个y_true里面有-1,这个应该不太对吧

liyao345496280 commented 1 year ago

请问一下,您代码里的那个数据集,我在官网上没找到它的验证集,请问大佬能否提供一下,我对比下数据集

Tongjilibo commented 1 year ago

好的,我晚上贴一下~

Tongjilibo commented 1 year ago

请问一下,您代码里的那个数据集,我在官网上没找到它的验证集,请问大佬能否提供一下,我对比下数据集

数据集-百度云链接

liyao345496280 commented 1 year ago

谢谢您  

Nier @.***

 

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年5月18日(星期四) 晚上8:31 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [Tongjilibo/bert4torch] GPLinker的中的MyLoss类下出现的问题 (Issue #132)

请问一下,您代码里的那个数据集,我在官网上没找到它的验证集,请问大佬能否提供一下,我对比下数据集

数据集-百度云链接

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>