Wei-ucas / TPSNet

Apache License 2.0
25 stars 9 forks source link

TypeError: Caught TypeError in DataLoader worker process 0. #8

Closed 256-7421142 closed 10 months ago

256-7421142 commented 1 year ago

作者你好,当我用1块gpu训练到一个epoch结束时,报错TypeError: Caught TypeError in DataLoader worker process 0.,我检查了totaltext数据集a) Train - It contains 1255 images. b) Test - It contains 300 images,一共1555张图片,为什么训练的时候显示一个epoch会训练1570张图片呢,还有为什么只会在epoch·训练结束时才报错呢

2023-08-17 21:32:36,974 - mmocr - INFO - Epoch [1][1560/1570] lr: 1.000e-03, eta: 21:28:11, time: 0.490, data_time: 0.023, memory: 7799, loss_text: 1.3758, loss_center: 1.0994, loss_point: 2.7431, loss_ba: 5.6853, loss: 10.9036

Wei-ucas commented 1 year ago

你好,为了减少重复创建dataloader导致的时间消耗,采用了RepeatDataset,也就是一个epoch会将训练数据重复5次,所以对应的iteration数量为ceil(1255 / 4) x 5. (1560/1570表示iteration进度). 你提到的训练错误我暂未遇到过,你可以检查一下环境配置,或者放上完整的错误信息方便我检查一下原因

Wei-ucas commented 1 year ago

对于totaltext数据集,并未划分单独划分验证集,而是直接在测试集上进行测试,所以验证集与测试集相同