Closed widgetxp closed 2 years ago
This issue is stale because it has been open for 30 days with no activity.
建议提前离线生成 txt,然后从 txt 读取 index 进行训练,应该可以解决这个问题
This issue was closed because it has been inactive for 14 days since being marked as stale.
If you do not know the root cause of the problem, and wish someone to help you, please post according to this template:
Instructions To Reproduce the Issue:
使用FastRetri训练一个1800万图的训练集,读完label文件后屏幕会停止输出约5分钟,然后报错,信息如下: 做了些尝试解决这个问题
网上一般建议减少dataloader的worker数目,由8降到4之后,才能开启训练,但是gpu利用率不足
尝试只用一半训练数据1100万图,仍是8个worker,也能正常开启训练。
数据加载队列的size由默认的10减小到2,不work。
batch_size由2048降低到1024,不work。 想问下出现这种问题的根本原因是什么,以及解决方案。配置截图如下:
full code you wrote or full changes you made (
git diff
)新增的数据集: