AttributeError: Can't pickle local object 'points_to_surf_train.<locals>.seed_train_worker'

TalonX1 commented 2 years ago

When i try to run it,error happens.

pytorch 1.10.2 py3.8_cuda11.3_cudnn8_0 pytorch

Traceback (most recent call last):
  File "D:/repo/points2surf/full_run.py", line 80, in <module>
    points_to_surf_train.points_to_surf_train(train_opt)
  File "D:\repo\points2surf\source\points_to_surf_train.py", line 428, in points_to_surf_train
    train_enum = enumerate(train_dataloader, 0)
  File "D:\Programs\anaconda3\envs\p2s\lib\site-packages\torch\utils\data\dataloader.py", line 354, in __iter__
    self._iterator = self._get_iterator()
  File "D:\Programs\anaconda3\envs\p2s\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "D:\Programs\anaconda3\envs\p2s\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    w.start()
  File "D:\Programs\anaconda3\envs\p2s\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\Programs\anaconda3\envs\p2s\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\Programs\anaconda3\envs\p2s\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "D:\Programs\anaconda3\envs\p2s\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\Programs\anaconda3\envs\p2s\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'points_to_surf_train.<locals>.seed_train_worker'

TalonX1 commented 2 years ago

I try to close the multiprocessing dataloader,it works. But i still want to know why i happened and want to use multiprocessing.

ErlerPhilipp commented 2 years ago

Hi @TalonX1!

This is most likely a Windows-specific problem (with Python). It worked when I tried it the last time. I sometimes had strange errors when the dataloader was shutting down its workers after the training. I can't reproduce this error.

Are you using the code without any modifications? Windows had / has problems with pickling lambdas.

Please try again with the package versions specified in the p2s.yml. You might need to replace the ">=" with "==". The Python version (3.7) is the most important thing here.

It might be a disguised out-of-memory error caused by Windows' inefficient process creation. Please try again with only 1 worker to check.

ErlerPhilipp commented 2 years ago

You can read up on spawn vs fork processes on Windows. I'm now training in WSL 2 because it has a proper fork and avoids a lot of overhead, especially virtual memory of DLLs.

TalonX1 commented 2 years ago

Hi @TalonX1!

This is most likely a Windows-specific problem (with Python). It worked when I tried it the last time. I sometimes had strange errors when the dataloader was shutting down its workers after the training. I can't reproduce this error.

Are you using the code without any modifications? Windows had / has problems with pickling lambdas.

Please try again with the package versions specified in the p2s.yml. You might need to replace the ">=" with "==". The Python version (3.7) is the most important thing here.

It might be a disguised out-of-memory error caused by Windows' inefficient process creation. Please try again with only 1 worker to check.

Thank you for your rapid reply,I will try like what you said,and close the issue temporarily.

ErlerPhilipp / points2surf

AttributeError: Can't pickle local object 'points_to_surf_train.<locals>.seed_train_worker' #15