IsoNet-cryoET / IsoNet

Self-supervised learning for isotropic cryoET reconstruction
https://www.nature.com/articles/s41467-022-33957-8
MIT License
67 stars 12 forks source link

Isonet support NVIDIA A100 #26

Open cron-weasley opened 2 years ago

cron-weasley commented 2 years ago

Dear Author, First thanks a lot for this powerful software ! I have a question: Is IsoNet support CUDA 11.2 and NVIDIA A100 with tensorflow-gpu_2.7? I install isonet with conda python3.9 and tensorflow-gpu_2.7+cuda11.2+NVIDIA A100 get this error:

(py39) [root@Isonet]$ isonet.py refine subtomo.star --gpuID 0,1,2,3 --iterations 30 --noise_start_iter 10,15,20,25 --noise_level 0.05,0.1,0.15,0.2 04-16 22:14:09, INFO

Isonet starts refining

04-16 22:14:27, INFO Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. 04-16 22:14:27, INFO Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 04-16 22:14:27, INFO NumExpr defaulting to 8 threads. 04-16 22:14:30, WARNING The results folder already exists before the 1st iteration The old results folder will be renamed (to results~) 04-16 22:14:50, INFO Done preperation for the first iteration! 04-16 22:14:50, INFO Start Iteration1! /data1/apps/miniconda3/envs/py39/lib/python3.9/site-packages/keras/optimizer_v2/adam.py:105: UserWarning: The lr argument is deprecated, use learning_rate instead. super(Adam, self).init(name, **kwargs) /data1/apps/miniconda3/envs/py39/lib/python3.9/site-packages/keras/engine/functional.py:1410: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument. layer_config = serialize_layer_fn(layer) 04-16 22:14:54, INFO Noise Level:0.0 2022-04-16 22:15:06.826516: F tensorflow/stream_executor/cuda/cuda_driver.cc:153] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error

Thanks a lot!

procyontao commented 2 years ago

Hi,

We only tested tensorflow2.5. and python3.6 on A100 GPUs. IsoNet works fine with A100.

Could you run the command with parameter: "--log_level debug" and see what is the error message?

cron-weasley commented 2 years ago

Hi,

We only tested tensorflow2.5. and python3.6 on A100 GPUs. IsoNet works fine with A100.

Could you run the command with parameter: "--log_level debug" and see what is the error message?

Dear procyontao, Thanks a lot! And could you tell me which cuda version and which NVIDIA A100 driver did you use?

cron-weasley commented 2 years ago

I sloved the problem. Thanks procyontao! I use python 3.8 with miniconda and install: pip install imageio==2.10.5 numpy==1.19.2 then install pip install tensorflow-gpu==2.5.0 pip install -r requirements.txt And Isonet run successfully.

proteincommandr commented 2 years ago

Hi all, I also required TensorFlow v2.6.0 to run IsoNet properly on A100 GPUs. I figured it would be nice to provide some GPU benchmarks for the tutorial dataset

I used TF2.6.0 Cuda11.6 IsoNet0.1 on the three tutorial tomograms

System Time/step speedup 2x 2070S 950ms 1 4x2080TI 700ms ~1.4 4x1080TI 900ms ~1 1xRTX8000 1000ms ~1 2xRTX8000 700ms ~1.3 4xA100 270ms ~3.5

Cheers

procyontao commented 2 years ago

Hi proteincommandr,

Thank you for providing the GPU benchmarks. We do not even afford that many types of GPU for a speed test.

One question, did you consider the differences in batch_size (i.e. number of subtomograms processed in one step)? the default relation between number of GPUs and default batch_size is listed below: nGPUs batch_size 1 4 2 4 3 6 4 8 5 10 6 12 7 14 8 16

If you do not specify the batch size in your command, your list should be: System Time/step speedup batch_size speedup_persubtomo 2x 2070S 950ms 1 4 0.5 4x2080TI 700ms ~1.4 8 1.4 4x1080TI 900ms ~1 8 1 1xRTX8000 1000ms ~1 4 0.5 2xRTX8000 700ms ~1.3 4 0.65 4xA100 270ms ~3.5 8 3.5