Project-MONAI / tutorials

MONAI Tutorials
https://monai.io/started.html
Apache License 2.0
1.74k stars 662 forks source link

can't run on large datasets #1759

Open Smarter1214 opened 1 month ago

Smarter1214 commented 1 month ago

When I correctly follow the steps in the auto3dseg_hello_world.ipynb notebook, set the corresponding paths and parameters, and run it in an environment with 48G of GPU memory, I encounter the error RuntimeError: Pin memory thread exited unexpectedly while attempting to train on a dataset with 300 .nii.gz images. In contrast, when using a dataset with 20 images, the training proceeds smoothly under the exact same conditions. During the training process with the 300-image dataset, I monitored the GPU memory usage and found it to be less than 70%. However, the error keeps occurring inexplicably. Could there be an issue with the get_data step?

ericspod commented 1 month ago

@mingxin-zheng @dongyang0122 @wyli would anyone have insights here? This may be related to multiprocessing issues, number of open files, garbage collection, the Pytorch sharing strategy, or some other technique issue. Thanks!

mingxin-zheng commented 1 month ago

Thanks @Smarter1214 for finding the issue. It would be helpful if you can share some logs/outputs so that we can further pinpoint the issue

In general, I am wondering in which step the error occurs, DataAnalyzing vs Training?