computational-cell-analytics / dl-for-micro

Course and exercises on deep learning for microscopy image analysis
MIT License
13 stars 0 forks source link

Dataset class usage in jupyter lab #20

Open manerotoni opened 9 months ago

manerotoni commented 9 months ago

Hi, I spotted an issue in the notebooks when using python 3.11 (and may be other versions) on my Windows machine. Somehow the Dataset Class (CustomDataset) when defined in the notebook (e.g. torch_infection_classifier.ipynb) issues an error upon the

x, y = next(iter(train_loader))

The error on the command windows is

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\apoliti\Miniconda3\envs\dl-for-micro-2\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\apoliti\Miniconda3\envs\dl-for-micro-2\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'CustomDataset' on <module '__main__' (built-in)>

A few google search indicate that the somehow the declaration in the notebook is not understood and there is some incompatibilities of multiprocessing and interactive mode (jupyter notebooks) See https://discuss.pytorch.org/t/issue-with-pretrained-resnet-fixed/109637/4 and https://stackoverflow.com/questions/73763151/multiprocessing-error-self-reduction-pickle-loadfrom-parent-attributeerror

manerotoni commented 9 months ago

Fortunately it works with 3.8 as the version of python used on BAND. I tried the class definition in a file and then it works nicely.

May be after the course we should make sure to fix this. I am not sure if this is a bug from multiprocessing module or a feature.

manerotoni commented 9 months ago

Similar problem with 3.8.10 (identical python version as on BAND).

It is all a little strange as there must be some Windows/package issues. I have been using very similar code in other projects and never had this error.

I move my changes on BAND now to wrap the course work

constantinpape commented 9 months ago

Good to know that this issue exists on windows.

This is most likely because multiprocessing works differently on Windows. We probably need some workaround that imports the dataset from utils.py for that case.

constantinpape commented 9 months ago

P.s I can't really fix this, I don't have access to a Windows Machine. We can see how to address this after the course.

manerotoni commented 9 months ago

I just add this link https://bobswinkels.com/posts/multiprocessing-python-windows-jupyter/ Basically the best option would be to outsource the function in an extra python file to be imported. It is a little unfortunate that the error only appears on the command windows and not within the jupyter notebook

manerotoni commented 8 months ago

I found out why in my case the loader did not create problem. If you set num_workers = 0 (use main thread only, which is the default) than it does not complain that does not find the Dataset class.

For the sake of inter OS usability I would remove this option. For the course it is not crucial.

for the moment just the display of images with num_workers >0 is really slow. Not sure why, may be because it needs to start all threads. In fact the time increases with more threads. Not sure if the training is faster, when the threads are all running on the back.

constantinpape commented 8 months ago

@manerotoni : great that you figured this out. Let's set the number of workers to 0. This is indeed not crucial at all here. (It can make a difference for more complex pipelines but I don't expect a big difference here at all.)

Do you want to create a PR to fix this?

for the moment just the display of images with num_workers >0 is really slow. Not sure why, may be because it needs to start all threads.

Yes, this is slower because all threads need to start.

manerotoni commented 8 months ago

I will do a PR