RuntimeError: DataLoader worker (pid(s) 3360242, 3361527, 3362405) exited unexpectedly

cvlab-columbia / zero123

Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)

https://zero123.cs.columbia.edu/

MIT License

2.59k stars 188 forks source link

RuntimeError: DataLoader worker (pid(s) 3360242, 3361527, 3362405) exited unexpectedly #122

Closed yisuanwang closed 5 months ago

yisuanwang commented 5 months ago

Has anyone had this problem? It occurs when training tests on the original partial objaverse dataset. I'm using 8*A100 80G gpu. config stays stock. Used a truncated version of valid_path.json, which has about 4000 objects. \ File "/data/xxx/.conda/zero123/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 3360242, 3361527, 3362405) exited unexpectedly

kada0720 commented 5 months ago

same problem

xiaobiaodu commented 5 months ago

I guess you were setting too large num_workers. Try to reduce it.

kada0720 commented 5 months ago

I trained on a single gpu. I set num_workers to 1 and I still get this error. The other question is, does the code automatically download the data set when I run main.py (I get the error "No space left on device" when I run it) ?

yisuanwang commented 5 months ago

The problem doesn't seem to have anything to do with num_workers, it's a problem with the training data I generated myself. Now I have successfully trained. Thanks.