训练时报错 OSError: (External) CUDA error(719), unspecified launch failure. #3137

Open q465414859 opened 1 month ago

q465414859 commented 1 month ago

Error: C:\home\workspace\Paddle\paddle\phi\kernels\gpu\ Assertion false failed. The value of label expected >= 0 and < 7, or == -100, but got 29. Please check label value. Error: C:\home\workspace\Paddle\paddle\phi\kernels\gpu\ Assertion false failed. The value of label expected >= 0 and < 7, or == -100, but got 29. Please check label value. Error: C:\home\workspace\Paddle\paddle\phi\kernels\gpu\ Assertion false failed. The value of label expected >= 0 and < 7, or == -100, but got 29. Please check label value. Traceback (most recent call last): File "tools/", line 32, in engine.train() File "F:\code\PaddleClas\ppcls\engine\", line 339, in train self.train_epoch_func(self, epoch_id, print_batch_step) File "F:\code\PaddleClas\ppcls\engine\train\", line 54, in train_epoch loss_dict = engine.train_loss_func(out, batch[1]) File "F:\code\PaddleClas\ppcls\", line 58, in call__ loss = self.loss_func[0](input, batch) File "D:\anaconda\envs\PaddleClas\lib\site-packages\paddle\nn\layer\", line 1254, in call return self.forward(*inputs, **kwargs) File "F:\code\PaddleClas\ppcls\loss\", line 57, in forward loss = F.cross_entropy(x, label=label, soft_label=soft_label) File "D:\anaconda\envs\PaddleClas\lib\site-packages\paddle\nn\functional\", line 2790, in cross_entropy if paddle.count_nonzero(is_ignore) > 0: # ignore label File "D:\anaconda\envs\PaddleClas\lib\site-packages\paddle\fluid\dygraph\", line 673, in bool return self.nonzero() File "D:\anaconda\envs\PaddleClas\lib\site-packages\paddle\fluid\dygraph\", line 670, in nonzero return bool(np.array(self) > 0) File "D:\anaconda\envs\PaddleClas\lib\site-packages\paddle\fluid\dygraph\", line 696, in array array = self.numpy(False) OSError: (External) CUDA error(719), unspecified launch failure. [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. L ess common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ..\paddle\phi\backends\gpu\cuda\ ········ 上面是报错信息

我得cuda环境是没问题的,训练OCR都可以。下面是配置文件与分类文件 ······ class_gt.txt PPLCNet_x1_0_search.txt

q465414859 commented 1 month ago

class_gt.txt 生成的数据有写问题,但我已经修复了

q465414859 commented 1 month ago

q465414859 commented 1 month ago

@Sunting78 能帮助下吗?

cuicheng01 commented 1 month ago


q465414859 commented 1 month ago



cuicheng01 commented 1 month ago


q465414859 commented 1 month ago



cuicheng01 commented 3 weeks ago


chinesejunzai12 commented 5 days ago

你好, 问题解决了么, 我也出现同样的问题了,数据集用的是百度推荐的数据集, 但是也会报错