RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'

yuliang-liang commented 1 year ago

Traceback (most recent call last): File "D:\Research\Code\CBLUE\baselines\run_classifier.py", line 183, in main() File "D:\Research\Code\CBLUE\baselines\run_classifier.py", line 158, in main global_step, best_step = trainer.train() File "D:\Research\Code\CBLUE.\cblue\trainer\train.py", line 87, in train loss = self.training_step(model, item) File "D:\Research\Code\CBLUE.\cblue\trainer\train.py", line 207, in trainingstep outputs = model(labels=labels, input_ids=input_ids, token_type_ids=token_type_ids,_ File "C:\Users\lyl\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\lyl\anaconda3\envs\pytorch\lib\site-packages\transformers\models\bert\modeling_bert.py", line 1778, in forward loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) File "C:\Users\lyl\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "C:\Users\lyl\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\loss.py", line 1174, in forward return F.cross_entropy(input, target, weight=self.weight, File "C:\Users\lyl\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 3026, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'

请问，下载后，在pytorch1 2 都跑不通，请问哪里出了问题，应该怎么修改，谢谢！

flow3rdown commented 1 year ago

请问您跑的哪个任务，使用的哪个数据集和模型？

yuliang-liang commented 1 year ago

请问您跑的哪个任务，使用的哪个数据集和模型？

Task : examples/run_ee.sh Dataset: CBLUEDatasets/CMeEE 下载自天池CBLUE Model: bert-wwm-ext 下载自 https://github.com/ymcui/Chinese-BERT-wwm bert-wwm-ext pytorch版

run_ee.sh文件头如下


DATA_DIR="CBLUEDatasets"

TASK_NAME="ee"
MODEL_TYPE="bert"
MODEL_DIR="data\model_data"
MODEL_NAME="bert-wwm-ext"
OUTPUT_DIR="data\output"
RESULT_OUTPUT_DIR="data\result_output"

flow3rdown commented 1 year ago

我这边刚测试过没出现这个bug，您是否修改过数据集或者代码文件呢？

yuliang-liang commented 1 year ago

没有修改过呢，我再试试，看看是否是pytorch版本的问题。

yuliang-liang commented 1 year ago

我这边刚测试过没出现这个bug，您是否修改过数据集或者代码文件呢？

我的环境，win11，现已修复办法： cblue>trainer>train.py 修改 def training_step()里面的语句 labels = item[3].to(self.args.device) 为 labels = item[3].type(torch.LongTensor).to(self.args.device)

def evaluate()函数也需要修改。

CBLUEbenchmark / CBLUE

RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int' #9