IsoNet-cryoET / IsoNet

Self-supervised learning for isotropic cryoET reconstruction
https://www.nature.com/articles/s41467-022-33957-8
MIT License
70 stars 13 forks source link

"Data array contains NaN values" error #33

Closed rdrighetto closed 2 years ago

rdrighetto commented 2 years ago

Hi,

I'm getting the following error in prediction:

diogori@worker08:isonet$ time isonet.py predict MTEC_tomo055_72_d3.star ./MTEC_tomo055_72_d3_results/model_iter09.h5 --output_dir MTEC_tomo055_72_d3_corrected_iter09 --cube_size 80 --gpuID 0,1,2,3
09-09 21:06:46, INFO     

######Isonet starts predicting######

09-09 21:07:35, INFO     gpuID:0,1,2,3
09-09 21:08:37, INFO     Loaded model from disk
09-09 21:08:37, INFO     predicting:MTEC_tomo055_72_d3
09-09 21:08:51, INFO     total batches: 54
100%|                                                              | 54/54 [01:37<00:00,  1.80s/it]
/scicore/home/engel0006/GROUP/pool-engel/soft/isonet/venv_isonet/lib/python3.9/site-packages/mrcfile/mrcobject.py:545: RuntimeWarning: Data array contains NaN values
  warnings.warn("Data array contains NaN values", RuntimeWarning)
09-09 21:10:47, INFO     Done predicting
Exception ignored in: <function Pool.__del__ at 0x7fe7ff44f700>
Traceback (most recent call last):
  File "/scicore/soft/apps/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/scicore/soft/apps/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/multiprocessing/queues.py", line 378, in put
    self._writer.send_bytes(obj)
  File "/scicore/soft/apps/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/scicore/soft/apps/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/scicore/soft/apps/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

I know the "Bad file descriptor" error is harmless, but mrcfile is complaining that the resulting tomogram has NaN values. This is the part that concerns me: /scicore/home/engel0006/GROUP/pool-engel/soft/isonet/venv_isonet/lib/python3.9/site-packages/mrcfile/mrcobject.py:545: RuntimeWarning: Data array contains NaN values warnings.warn("Data array contains NaN values", RuntimeWarning)

It does write out an MRC volume, but it cannot be displayed because of the NaN-valued voxels. Has anyone encountered this error before and knows how to correct this?

Thank you!

rdrighetto commented 2 years ago

Just a short update: there were no issues in prediction when using models from later iterations (e.g. iter30). The error above occurred when using models from iterations 5-10. I agree it may not be a good idea to look at such early models, but still would like to know why this happens and how to avoid it.

procyontao commented 2 years ago

Hi,

Thank you for reporting this. I know another user also encounter this NaN problem recently, but I can not reproduce it on my computers.

I would like to know whether this model can successfully be used to predict other tomograms?

rdrighetto commented 2 years ago

Thanks for the reply! Good idea, I will try to predict a different tomogram with these models that were giving the error. I'll report back next week, I'm currently traveling.

rdrighetto commented 2 years ago

Strange that now I can not only predict other tomograms using those same problematic models, but also predict the same tomogram that was having this problem before, without issues. Maybe it was an issue with the specific computing node/GPU where I was running it before on our cluster. I will close the issue now for failing to reproduce the problem.