Can't work well with pytorch online trainning when set num_worker > 0

MartinMML commented 2 years ago

Hello, I met a problem during using the gpuRIR with Pytorch online training when I set the num_worker of dataloader bigger than 0. The Error info is: GPUassert: initialization error gpuRIR_cuda.cu 793. But it works well with num_worker=0. Is this a known problem and do you have any good suggestions? Thanks a lot.

DavidDiazGuerra commented 2 years ago

Hello Martin,

I know gpuRIR doesn't work when you try to run it in parallel threads using num_worker > 0 in PyTorch dataloaders. I have never worked on that but I know that PyTorch generally doesn't recommend doing CUDA works in the parallel dataloaders: https://pytorch.org/docs/stable/data.html

It seems like the recommendations in the PyTorch documentation are about returning GPU variables so it shouldn't affect gpuRIR, which performs some CUDA works but then move the result to the CPU before returning it, so maybe there are some issues about how I'm initializing some CUDA stuff on gpuRIR that make it crash when using it in multithreaded programs. However, I don't know too much about this topic and I don't have time to dig deeper into this right now, so I'm afraid I can't offer more help about it.

Let me know if you find something more about the topic. I'll leave the issue open just in case someone else can see it and offer some help.

Best regards, David

MartinMML commented 2 years ago

okay, I got it. Thanks a lot.

acappemin commented 12 months ago

Add the following line at the beginning of your code will help. multiprocessing.set_start_method('forkserver')

DavidDiazGuerra commented 11 months ago

Thanks for the tip! I'll try to check this up when I have some time.

DavidDiazGuerra / gpuRIR

Can't work well with pytorch online trainning when set num_worker > 0 #37