Support for single GPU - Githubissues

jpcenteno80 commented 3 years ago

Hi, thanks for the great code! I changed the hard coded mgpu = 4 to mgpu = 1 in the step1_heartloc/run_inference.py file. But when I run python run_step1_heart_localization.py I encounter the following error:

Deep Learning model inference using 4xGPUs:
Loading saved model from "../data/step1_heartloc/model_weights/step1_heartloc_model_weights.hdf5"
Compiling single GPU model...
Traceback (most recent call last):
  File "run_step1_heart_localization.py", line 153, in <module>
    weights_file_name = weights_file_name)
  File "/home/jpcenteno/development/DeepCAC/src/step1_heartloc/run_inference.py", line 150, in run_inference
    pkl_file, test_file, weights_file, mgpu, has_manual_seg, export_png)
  File "/home/jpcenteno/development/DeepCAC/src/step1_heartloc/run_inference.py", line 66, in test
    model.load_weights(weights_file)
  File "/home/jpcenteno/development/venv/lib/python2.7/site-packages/tensorflow/python/keras/engine/training.py", line 162, in load_weights
    return super(Model, self).load_weights(filepath, by_name)
  File "/home/jpcenteno/development/venv/lib/python2.7/site-packages/tensorflow/python/keras/engine/network.py", line 1424, in load_weights
    saving.load_weights_from_hdf5_group(f, self.layers)
  File "/home/jpcenteno/development/venv/lib/python2.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 737, in load_weights_from_hdf5_group
    ' layers.')
ValueError: You are trying to load a weight file containing 1 layers into a model with 19 layers.

Wondering if this is related to trying to use single GPU (my server only has a single 16 GB GPU) or something else. Thank you for your help!

jpcenteno80 commented 3 years ago

Inspection of data/step1_heartloc/model_weights/step1_heartloc_model_weights.hdf5 yields the following keys:

import h5py

f = h5py.File('step1_heartloc_model_weights.hdf5', 'r')

f.keys()
## <KeysViewHDF5 ['concatenate_1', 'lambda_1', 'lambda_2', 'lambda_3', 'lambda_4', 'model_1', 'model_input']>

f['model_1'].keys()
## <KeysViewHDF5 ['conv_10', 'conv_1_1', 'conv_1_2', 'conv_2_1', 'conv_2_2', 'conv_3_1', 'conv_3_2', 'conv_4_1', 'conv_4_2', 'conv_5_1', 'conv_5_2', 'conv_6_1', 'conv_6_2', 'conv_7_1', 'conv_7_2', 'conv_8_1', 'conv_8_2', 'conv_9_1', 'conv_9_2']>

len(f['model_1'].keys())
## 19

The model's layers are under the model_1 key. Tried to make a new .hdf5 with just the model_1 contents, but still running into the same error (above).

9zelle9 commented 3 years ago

Unfortunately, as of now there is no support for running the pre-trained models on a system with other than 4 GPUs, at least to our knowledge. This is due to the way the model was trained using parallel_model = multi_gpu_model(model, gpus=4)

We are already training new models on single GPUs which then should be able to even run on CPUs (although slower) and will publish them as soon as possible.

nanboxian commented 3 years ago

still waiting for the new weight file for one gpu.

AIM-Harvard / DeepCAC

Support for single GPU #2