Docker not working properly

junyuchen245 commented 1 month ago

Hi @TaoZhong11 ,

Thanks for this amazing work!

I encountered an error while running the Docker image with data I obtained from https://fcon_1000.projects.nitrc.org/indi/PRIMEdownloads.html. To troubleshoot, I tested the demo dataset by keeping only the file macaque_sub-032144_ses-001_run-1_T1w.nii.gz in the data/ directory and removing the rest. However, I received the same error message.

The only output I got was brain masks, but they were empty (all zero values).

Any suggestions on how to resolve this issue would be much appreciated! Thank you!

Predicting brain mask
using model stored in  /workspace/nnUNet_trained_models/nnUNet/3d_fullres/Task509_tissue_infant/nnUNetTrainerV2_DA3_BN_UNeXt_axial_attn__nnUNetPlans_pretrained_IDENTIFIER
This model expects 1 input modalities for each image
Found 1 unique case ids, here are some examples: ['macaque_sub-032144_ses-001_run-1_T1w']
number of cases: 1
number of cases that still need to be predicted: 1
emptying cuda cache
loading parameters for folds, [1]
EVALUATION_FOLDER is not defined and nnU-Net extension cannot be used for evaluation. If this is not intended behavior, please read documentation/setting_up_paths.md for information on how to set this up.

/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py:155: UserWarning:
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Updating the Loss based on the provided previous trainer
using the following model files:  ['/workspace/nnUNet_trained_models/nnUNet/3d_fullres/Task509_tissue_infant/nnUNetTrainerV2_DA3_BN_UNeXt_axial_attn__nnUNetPlans_pretrained_IDENTIFIER/fold_1/model_be.model']
starting preprocessing generator
starting prediction...
preprocessing brain_mask/macaque_sub-032144_ses-001_run-1_T1w.nii.gz
using preprocessor GenericPreprocessor
before crop: (1, 256, 256, 240) after crop: (1, 256, 256, 240) spacing: [0.5 0.5 0.5]

no resampling necessary
no resampling necessary
before: {'spacing': array([0.5, 0.5, 0.5]), 'spacing_transposed': array([0.5, 0.5, 0.5]), 'data.shape (data is transposed)': (1, 256, 256, 240)}
after:  {'spacing': [0.5, 0.5, 0.5], 'data.shape (data is resampled)': (1, 256, 256, 240)}

(1, 256, 256, 240)
This worker has ended successfully, no errors to report
predicting brain_mask/macaque_sub-032144_ses-001_run-1_T1w.nii.gz
debug: mirroring True mirror_axes (0, 1, 2)
step_size: 0.5
do mirror: True
data shape: (1, 256, 256, 240)
patch size: [128 128 128]
steps (x, y, and z): [[0, 64, 128], [0, 64, 128], [0, 56, 112]]
number of tiles: 27
computing Gaussian
prediction done
inference done. Now waiting for the segmentation export to finish...
force_separate_z: None interpolation order: 1
no resampling necessary
WARNING! Cannot run postprocessing because the postprocessing file is missing. Make sure to run consolidate_folds in the output folder of the model first!
The folder you need to run this in is /workspace/nnUNet_trained_models/nnUNet/3d_fullres/Task509_tissue_infant/nnUNetTrainerV2_DA3_BN_UNeXt_axial_attn__nnUNetPlans_pretrained_IDENTIFIER
Obtaining brain img
macaque_sub-032144_ses-001_run-1_T1w.nii.gz macaque_sub-032144_ses-001_run-1_T1w.nii.gz
Failed to extract brain for scan  macaque_sub-032144_ses-001_run-1_T1w.nii.gz
Bias field correction by bfc
Predicting brain cerebellum and brainstem mask
using model stored in  /workspace/nnUNet_trained_models/nnUNet/3d_fullres/Task509_tissue_infant/nnUNetTrainerV2_DA3_BN_UNeXt_axial_attn__nnUNetPlans_pretrained_IDENTIFIER
This model expects 1 input modalities for each image
Traceback (most recent call last):
  File "/usr/local/bin/nnUNet_predict", line 33, in <module>
    sys.exit(load_entry_point('nBEST', 'console_scripts', 'nnUNet_predict')())
  File "/workspace/nnunet/inference/predict_simple.py", line 221, in main
    step_size=step_size, checkpoint_name=args.chk)
  File "/workspace/nnunet/inference/predict.py", line 636, in predict_from_folder
    case_ids = check_input_folder_and_return_caseIDs(input_folder, expected_num_modalities)
  File "/workspace/nnunet/inference/predict.py", line 577, in check_input_folder_and_return_caseIDs
    assert len(files) > 0, "input folder did not contain any images (expected to find .nii.gz file endings)"
AssertionError: input folder did not contain any images (expected to find .nii.gz file endings)
Obtaining brain cerebrum img
Predicting brain tissue segmentation
using model stored in  /workspace/nnUNet_trained_models/nnUNet/3d_fullres/Task509_tissue_infant/nnUNetTrainerV2_DA3_BN_UNeXt_axial_attn__nnUNetPlans_pretrained_IDENTIFIER
This model expects 1 input modalities for each image
Traceback (most recent call last):
  File "/usr/local/bin/nnUNet_predict", line 33, in <module>
    sys.exit(load_entry_point('nBEST', 'console_scripts', 'nnUNet_predict')())
  File "/workspace/nnunet/inference/predict_simple.py", line 221, in main
    step_size=step_size, checkpoint_name=args.chk)
  File "/workspace/nnunet/inference/predict.py", line 636, in predict_from_folder
    case_ids = check_input_folder_and_return_caseIDs(input_folder, expected_num_modalities)
  File "/workspace/nnunet/inference/predict.py", line 577, in check_input_folder_and_return_caseIDs
    assert len(files) > 0, "input folder did not contain any images (expected to find .nii.gz file endings)"
AssertionError: input folder did not contain any images (expected to find .nii.gz file endings)
Predicting subcortical
using model stored in  /workspace/nnUNet_trained_models/nnUNet/3d_fullres/Task509_tissue_infant/nnUNetTrainerV2_DA3_BN_UNeXt_axial_attn__nnUNetPlans_pretrained_IDENTIFIER
This model expects 1 input modalities for each image
Traceback (most recent call last):
  File "/usr/local/bin/nnUNet_predict", line 33, in <module>
    sys.exit(load_entry_point('nBEST', 'console_scripts', 'nnUNet_predict')())
  File "/workspace/nnunet/inference/predict_simple.py", line 221, in main
    step_size=step_size, checkpoint_name=args.chk)
  File "/workspace/nnunet/inference/predict.py", line 636, in predict_from_folder
    case_ids = check_input_folder_and_return_caseIDs(input_folder, expected_num_modalities)
  File "/workspace/nnunet/inference/predict.py", line 577, in check_input_folder_and_return_caseIDs
    assert len(files) > 0, "input folder did not contain any images (expected to find .nii.gz file endings)"
AssertionError: input folder did not contain any images (expected to find .nii.gz file endings)
Unable to extract brain for scan ['macaque_sub-032144_ses-001_run-1_T1w.nii.gz'] and the rest have been processed.

TaoZhong11 commented 1 month ago

Hi @junyuchen245

Thank you for reaching out. Based on the details you provided, it's unclear whether the issue stems from a mismatch between the H100 architecture and the PyTorch version (1.13.1+cu117) in the Docker, resulting in computations defaulting to the CPU.

I just tested the current version of nBEST on an NVIDIA 4090 GPU, and it successfully completed the demo. This suggests that the current setup functions properly with the 4090 (or older) architecture.

As a potential solution, you might consider trying a different GPU compatible with the existing Docker setup. Alternatively, I am prepared to update the PyTorch version (cu118+) in the Docker to accommodate more advanced GPUs.

Thanks.

junyuchen245 commented 1 month ago

Understood. I missed the H100 warning somehow, so I thought the error was caused by something else. It would be great if you could update the PyTorch version to support newer architectures. I'll try this on other GPUs as well. Thank you!

Junyu

junyuchen245 commented 1 month ago

Hi @TaoZhong11,

Just a quick update that I was able to run nBEST on a different GPU. Thanks so much for your help!

Junyu

TaoZhong11 / nBEST

Docker not working properly #1