MSKCC-Computational-Pathology / MIL-nature-medicine-2019

340 stars 104 forks source link

RuntimeError: Dataloader worker is killed by signal: Killed #8

Closed Tato14 closed 4 years ago

Tato14 commented 4 years ago

Hi, I have just try to start with some learning and the MIL_train.py script fails with the title error. Searching a bit I found that seems to be related with multiprocess. If I set the --workers to 0 the error is not longer there. However, the learning time increases a lot. I guess is something related with openslide but I am not able to find anything related to how it handles multiprocessing.

Do you have any hints on this? Thanks

gabricampanella commented 4 years ago

It is difficult to debug your issue without more information. In my experience the biggest difficulty tends to be that of insufficient memory. Openslide by default keeps a cache for every opened slide. It is possible to modify openslide's source code to not create this cache. Then again your problem may be different.

Tato14 commented 4 years ago

So, this could be related to this previous issue, right?

https://github.com/MSKCC-Computational-Pathology/MIL-nature-medicine-2019/issues/5#issuecomment-527192685

Tato14 commented 4 years ago

Hi again, I am trying to install openslide from here in order to disable cache. However, I am not able to use it with Python and I am not sure how to force python to use this specific version of openslide. I know this is a bit out of the issue but, could you specify how did you manage to use it? Thanks :-)

gabricampanella commented 4 years ago

What I did is download the source code for openslide. Then made a very minor change to the source code: https://github.com/openslide/openslide/blob/master/src/openslide.c comment line 347, uncomment line 348 Then I installed openslide manually and linked the PATH, LD_LIBRARY_PATH and LIBRARY_PATH environmental variables appropriately. Then if you install openslide-python it should work. You can even check from within python which library is used when loading openslide.

Tato14 commented 4 years ago

Thanks! This solve the memory issue. Checking the %RAM used per worker, this change reduce the requirements to half. If anyone face similar problem I have to mention that I could not make it run in a conda environment.