HuguesTHOMAS / KPConv-PyTorch

Kernel Point Convolution implemented in PyTorch
MIT License
783 stars 155 forks source link

DataLoader for S3DIS causes train_S3DIS.py to crash when num_workers>0 #166

Closed ylevental closed 2 years ago

ylevental commented 2 years ago

I am running everything on Fedora 34.

From tracing the program, I found out that the program crashes at for batch_i, batch in enumerate(dataloader): in the S3DIS.py file. This is called by training_sampler.calibration(training_loader, verbose=True) in train_S3DIS.py.

I am not sure why, but I can tell you that setting num_workers=0 prevented the crash.

ylevental commented 2 years ago

I should also let you know, I was running the program in an IDE called IDLE when the program crashed. I tried running the script from the command line and it works fine.

Filed a bug here: https://github.com/python/cpython/issues/92571

HuguesTHOMAS commented 2 years ago

Thanks for filing the bug, I am not sure what is the cause of this, because I only use the code from the command line and in ubuntu. If setting num_workers=0 prevented the crash. It is probably due to a problem with how Pytorch dataloader handles multiprocessing.

If you want to use the code, I think the best course of action is to remain with command-line execution and avoid necessarily complex bugs.

Best, Hugues

ylevental commented 2 years ago

I believe that this is the reason why the IDE does not work https://github.com/python/cpython/issues/92633