IsoNet-cryoET / IsoNet

Self-supervised learning for isotropic cryoET reconstruction
https://www.nature.com/articles/s41467-022-33957-8
MIT License
67 stars 12 forks source link

ERROR multiprocessing.pool.RemoteTraceback: #31

Closed ccgauvin94 closed 2 years ago

ccgauvin94 commented 2 years ago

Using the command in the tutorial, plus batch size 16, preprocessing cpus 16, when it gets to iteration 9, I get the following error:

"""
Traceback (most recent call last):
  File "/opt/miniconda3/envs/isonet/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/miniconda3/envs/isonet/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/opt/IsoNet/preprocessing/prepare.py", line 157, in get_cubes
    get_cubes_one(data_X, data_Y, settings, start = start)
  File "/opt/IsoNet/preprocessing/prepare.py", line 95, in get_cubes_one
    noise_volume = read_vol(path_noise[path_index])
  File "/opt/IsoNet/preprocessing/prepare.py", line 92, in read_vol
    with mrcfile.open(f) as mf:
  File "/home/t93j956/.local/lib/python3.9/site-packages/mrcfile/load_functions.py", line 138, in open
    return NewMrc(name, mode=mode, permissive=permissive,
  File "/home/t93j956/.local/lib/python3.9/site-packages/mrcfile/mrcfile.py", line 108, in __init__
    self._open_file(name)
  File "/home/t93j956/.local/lib/python3.9/site-packages/mrcfile/mrcfile.py", line 125, in _open_file
    self._iostream = open(name, self._mode + 'b')
OSError: [Errno 9] Bad file descriptor: 'results/training_noise/n_00363.mrc'
"""

This has happened twice now, not sure why. GPU usage is at 33GB/45 but does it go up when the noise model stuff starts?

procyontao commented 2 years ago

Hi,

This problem should happen at the the very beginning of iteration 10, with default noise settings. IsoNet will generate 1000 noise volumes using CPU.

Please check whether you already have results/training_noise folder, whether IsoNet already generated some mrc files in that folder containing noise, and whether you have enough space on disks.

ccgauvin94 commented 2 years ago

I was trying to do this on a mounted SMB volume. Switching to local scratch seems to fix the issue. I wonder if the connection wasn't staying open or something.

Best, Colin

procyontao commented 2 years ago

I guess your SMB volume might not allow simultaneous IO with multiple processes. I do not have a solution but will keep this in mind.