Open Num13er-XIII opened 2 months ago
Also, I can fit a batch size of 5 in the GPU, but it will increase epoch time to 1800 seconds without improving GPU utilization
Exactly the same problem...but it worked well and fast on linux, maybe source code has some problem working on win32.
Update:
Same situation on linux system with a A6000 ada, but I should mention that the database is stored on a ntfs windows partition
Exactly the same problem...but it worked well and fast on linux, maybe source code has some problem working on win32.
I faced same issue on a linux system too, but my data is stored on windows partition in a local server
Exactly the same problem...but it worked well and fast on linux, maybe source code has some problem working on win32.
I faced same issue on a linux system too, but my data is stored on windows partition in a local server
I also found it works well on small dataset(about 8 GB after unpacked) in windows, but extremely slow on my whole dataset (about 340GB after unpacked). But I have not tried wsl2, while on payed ubuntu platform it works normally, I don't know why...
Hi Fabian,
I’m currently training a public dataset of coronary artery CTAs using nnU-Net, but I'm encountering an issue with GPU utilization. My system includes an RTX 4090 GPU, a 12700K CPU, 64 GB of RAM, and a Gen4 SSD, running under WSL2.
Previously, I successfully trained nnU-Net on smaller datasets with full GPU utilization. However, with this larger dataset, my GPU shows 0% usage most of the time, spiking only to 100% briefly. Each epoch is taking about 800 seconds to complete, which seems excessively long.
Could this be related to WSL2 or some configuration within nnU-Net? I've attached screenshots of my GPU utilization graph and the training plan file. What might be causing these utilization issues, and how can I improve the training performance?
Thank you for your insights!
Configuration name: 3d_fullres {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [96, 160, 160], 'median_image_size_in_voxels': [275.0, 509.0, 512.0], 'spacing': [0.5, 0.349609375, 0.349609375], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'num_pool_per_axis': [4, 5, 5], 'pool_op_kernel_sizes': [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [1, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'unet_max_num_features': 320, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True}
These are the global plan.json settings: {'dataset_name': 'Dataset003_ImageCAS', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [0.5, 0.349609375, 0.349609375], 'original_median_shape_after_transp': [275, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 1.0, 'mean': 0.39980047941207886, 'median': 0.3513513505458832, 'min': 0.0, 'percentile_00_5': 0.0, 'percentile_99_5': 1.0, 'std': 0.22766844928264618}}}