IMSY-DKFZ / htc

Semantic organ segmentation for hyperspectral images.
Other
28 stars 5 forks source link

[Bug] Torch max error in dice calc metric #19

Closed alfieroddan closed 8 months ago

alfieroddan commented 9 months ago

:bug: Bug

Hi HTC team,

Apologies to ask for help again so soon!

I think there may be a bug in calc_dice_metric from evaluate_images.py.

Description

(env) tay@tay:~/Code/ms-seg$ PATH_Tivita_HeiPorSPECTRAL=/media/tay/4TB/Datasets/HeiPorSPECTRAL PATH_HTC_RESULTS=results/ htc training --model image --config hs_tools/htc-config.json
[INFO][htc] Starting training of the fold fold_0 [1/1]                                                                        run_training.py:298
[INFO][htc] The following config will be used for training:                                                                    run_training.py:78
[INFO][htc] {'config_name': 'htc-config',                                                                                      run_training.py:79
 'dataloader_kwargs': {'batch_size': 5, 'num_workers': 8},                                                                                       
 'input': {'annotation_name': ['polygon#annotator1',                                                                                             
                               'polygon#annotator2',                                                                                             
                               'polygon#annotator3'],                                                                                            
           'data_spec': 'hs_tools/2fold-dataspec.json',                                                                                          
           'epoch_size': 500,                                                                                                                    
           'merge_annotations': 'union',                                                                                                         
           'n_channels': 100,                                                                                                                    
           'preprocessing': 'L1',                                                                                                                
           'transforms_gpu': [{'class': 'KorniaTransform',                                                                                       
                               'degrees': 45,                                                                                                    
                               'p': 0.5,                                                                                                         
                               'padding_mode': 'reflection',                                                                                     
                               'scale': [0.9, 1.1],                                                                                              
                               'transformation_name': 'RandomAffine',                                                                            
                               'translate': [0.0625, 0.0625]},                                                                                   
                              {'class': 'KorniaTransform',                                                                                       
                               'p': 0.25,                                                                                                        
                               'transformation_name': 'RandomHorizontalFlip'},                                                                   
                              {'class': 'KorniaTransform',                                                                                       
                               'p': 0.25,                                                                                                        
                               'transformation_name': 'RandomVerticalFlip'}]},                                                                   
 'label_mapping': 'htc.settings_seg>label_mapping',                                                                                              
 'lightning_class': 'htc.models.image.LightningImage>LightningImage',                                                                            
 'model': {'architecture_kwargs': {'encoder_name': 'efficientnet-b5',                                                                            
                                   'encoder_weights': 'imagenet'},                                                                               
           'architecture_name': 'Unet',                                                                                                          
           'model_name': 'ModelImage',                                                                                                           
           'pretrained_model': {'model': 'image',                                                                                                
                                'run_folder': '2023-02-08_14-48-02_organ_transplantation_0.8'}},                                                 
 'optimization': {'lr_scheduler': {'gamma': 0.99, 'name': 'ExponentialLR'},                                                                      
                  'optimizer': {'lr': 0.001,                                                                                                     
                                'name': 'Adam',                                                                                                  
                                'weight_decay': 0}},                                                                                             
 'swa_kwargs': {'annealing_epochs': 0},                                                                                                          
 'trainer_kwargs': {'accelerator': 'gpu',                                                                                                        
                    'devices': 1,                                                                                                                
                    'max_epochs': 100,                                                                                                           
                    'precision': '16-mixed'},                                                                                                    
 'validation': {'checkpoint_metric': 'dice_metric', 'dataset_index': 0}}                                                                         
Seed set to 1337
[DEBUG][htc] Used transformations:                                                                                              transforms.py:124
[ToType(dtype=torch.float16)]                                                                                                                    
[DEBUG][htc] Used transformations:                                                                                              transforms.py:124
[ToType(dtype=torch.float16)]                                                                                                                    
[INFO][htc.no_duplicates] Found pretrained run in the local hub dir at                                                            HTCModel.py:518
/home/tay/.cache/torch/hub/htc_checkpoints/image/2023-02-08_14-48-02_organ_transplantation_0.8                                                   
[INFO][htc] Successfully loaded the pretrained model (2 keys were skipped: ['model.architecture.segmentation_head.0.weight',      HTCModel.py:346
'model.architecture.segmentation_head.0.bias']).                                                                                                 
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name             ┃ Type             ┃ Params ┃
┡━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ model            │ ModelImage       │ 31.3 M │
│ 1 │ ce_loss_weighted │ CrossEntropyLoss │      0 │
│ 2 │ dice_loss        │ DiceLoss         │      0 │
└───┴──────────────────┴──────────────────┴────────┘
Trainable params: 31.3 M                                                                                                                         
Non-trainable params: 0                                                                                                                          
Total params: 31.3 M                                                                                                                             
Total estimated model params size (MB): 125                                                                                                      
Epoch 0/100 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/100 0:00:00 • -:--:-- 0.00it/s  [DEBUG][htc] Used transformations:                                                                                              transforms.py:124
[ToType(dtype=torch.float32), KorniaTransform]                                                                                                   
[WARNING][py.warnings] /home/tay/Code/ms-seg/env/lib/python3.10/site-packages/torch/nn/functional.py:4358: UserWarning: Default   warnings.py:109
grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old                
behavior is desired. See the documentation of grid_sample for details.                                                                           
  warnings.warn(                                                                                                                                 

Epoch 0/100 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:22 • 0:00:00 4.78it/s  [DEBUG][htc] Used transformations:                                                                                              transforms.py:124
[ToType(dtype=torch.float32)]                                                                                                                    
Epoch 0/100 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:22 • 0:00:00 4.78it/s  
Validation  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/252  0:00:12 • 0:00:25 6.87it/s  
[CRITICAL][htc] Uncaught exception:                                                                                           run_training.py:381
Traceback (most recent call last):                                                                                                               
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/run_training.py", line 393, in <module>                                
    fold_trainer.train_fold(args.run_folder, args.fold_name, args.test, file_log_handler)                                                        
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/run_training.py", line 64, in train_fold                               
    self._train_fold(model_dir_tmp, fold_name, *args)                                                                                            
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/run_training.py", line 231, in _train_fold                             
    trainer.fit(module)                                                                                                                          
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit                           
    call._call_and_handle_interrupt(                                                                                                             
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in                                   
_call_and_handle_interrupt                                                                                                                       
    return trainer_fn(*args, **kwargs)                                                                                                           
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl                     
    self._run(model, ckpt_path=ckpt_path)                                                                                                        
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run                          
    results = self._run_stage()                                                                                                                  
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in                              
_run_stage                                                                                                                                       
    self.fit_loop.run()                                                                                                                          
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run                            
    self.advance()                                                                                                                               
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance                        
    self.epoch_loop.run(self._data_fetcher)                                                                                                      
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 137, in                     
run                                                                                                                                              
    self.on_advance_end(data_fetcher)                                                                                                            
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 285, in                     
on_advance_end                                                                                                                                   
    self.val_loop.run()                                                                                                                          
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator                    
    return loop_run(self, *args, **kwargs)                                                                                                       
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 134, in run                     
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)                                                                     
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 391, in                         
_evaluation_step                                                                                                                                 
    output = call._call_strategy_hook(trainer, hook_name, *step_args)                                                                            
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in                                  
_call_strategy_hook                                                                                                                              
    output = fn(*args, **kwargs)                                                                                                                 
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 403, in                           
validation_step                                                                                                                                  
    return self.lightning_module.validation_step(*args, **kwargs)                                                                                
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/common/EvaluationMixin.py", line 29, in                                
validation_step                                                                                                                                  
    self.validation_results_epoch.append(self._validate_batch(batch, dataloader_idx))                                                            
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/common/EvaluationMixin.py", line 63, in                                
_validate_batch                                                                                                                                  
    batch_results_class = evaluate_images(                                                                                                       
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/evaluation/evaluate_images.py", line 332, in                                  
evaluate_images                                                                                                                                  
    dice = calc_dice_metric(predictions_labels, labels, mask)                                                                                    
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/evaluation/evaluate_images.py", line 123, in                                  
calc_dice_metric                                                                                                                                 
    invalid_label_index = max(predictions_labels.max(), labels.max()) + 1                                                                        
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim'                     
argument.                                                                                                                                        
Traceback (most recent call last):
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/run_training.py", line 393, in <module>
    fold_trainer.train_fold(args.run_folder, args.fold_name, args.test, file_log_handler)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/run_training.py", line 64, in train_fold
    self._train_fold(model_dir_tmp, fold_name, *args)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/run_training.py", line 231, in _train_fold
    trainer.fit(module)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 137, in run
    self.on_advance_end(data_fetcher)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 285, in on_advance_end
    self.val_loop.run()
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 403, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/common/EvaluationMixin.py", line 29, in validation_step
    self.validation_results_epoch.append(self._validate_batch(batch, dataloader_idx))
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/models/common/EvaluationMixin.py", line 63, in _validate_batch
    batch_results_class = evaluate_images(
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/evaluation/evaluate_images.py", line 332, in evaluate_images
    dice = calc_dice_metric(predictions_labels, labels, mask)
  File "/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/evaluation/evaluate_images.py", line 123, in calc_dice_metric
    invalid_label_index = max(predictions_labels[mask].max(), labels[mask].max()) + 1
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
[ERROR][htc] Training of the fold fold_0 was not successful (returncode=1                                                     run_training.py:301
[ERROR][htc] Some folds were not successful (see error messages above)                                                        run_training.py:305
[INFO][htc] Training time for the all folds: 0 minutes and 47.35 seconds                                                      run_training.py:307

I did a little investigating and added some print statements to the following:

print(predictions_labels.shape)
print(labels.shape)

print(predictions_labels[mask])
print(labels[mask])

# Add mask class (will be removed later)
invalid_label_index = max(predictions_labels[mask].max(), labels[mask].max()) + 1
predictions_labels[~mask] = invalid_label_index
labels[~mask] = invalid_label_index
n_labels = invalid_label_index + 1

Output just before break is as follows:

torch.Size([5, 480, 640])
torch.Size([5, 480, 640])
tensor([18, 18, 18,  ...,  6,  6,  6], device='cuda:0')
tensor([14, 14, 14,  ..., 14, 14, 14], device='cuda:0')
torch.Size([5, 480, 640])
torch.Size([5, 480, 640])
tensor([], device='cuda:0', dtype=torch.int64)
tensor([], device='cuda:0', dtype=torch.int64)
Epoch 0/100 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:22 • 0:00:00 4.89it/s   
Validation  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/252  0:00:08 • 0:00:17 10.24it/s  
[CRITICAL][htc] Uncaught exception:                                                        

So I'm guessing there's cases where the mask is None? I assume this is due to the union of the 3 annotators?

Dataset

HeiPorSPECTRAL

"""
Small script to generate a config for training.
"""

import htc
from pathlib import Path

def main():
    "Simple main function"
    # load default config
    config = htc.Config.from_model_name("default", "image")
    # data and image
    config["input/data_spec"] = "./hs_tools/2fold-dataspec.json"
    # inherits
    config["inherits"] = "models/image/configs/default"
    # ensure all three annotations
    config["input/annotation_name"] = [
        "polygon#annotator1",
        "polygon#annotator2",
        "polygon#annotator3"
    ]
    # merge annotations from all annotators into one mask
    config["input/merge_annotations"] = "union"
    config['dataloader_kwargs/num_workers'] = 8
    config['dataloader_kwargs/batch_size']: 5
    # model
    config["model/pretrained_model"] = {
        "model": "image",
        # "2022-02-03_22-58-44_generated_default_model_comparison"
        "run_folder": "2023-02-08_14-48-02_organ_transplantation_0.8",
    }
    # devices
    config["trainer_kwargs/devices"] = 1
    # save config
    save_path = Path("hs_tools/htc-config.json")
    config.save_config(save_path)

if __name__ == "__main__":
    main()

Environment

htc framework
- version: 0.0.13
- url: https://github.com/imsy-dkfz/htc
- git commit: 490ec78d5b6261f809617fd4288b8c8fa877a399

User settings:
No user settings found. If you want to use your user settings to specify environment variables, please create the file 
/home/tay/.config/htc/variables.env and add your environment variables, for example:
export PATH_HTC_NETWORK="/path/to/your/network/dir"
export PATH_Tivita_my_dataset="~/htc/Tivita_my_dataset:shortcut=my_shortcut"

.env settings:
No .env file found. If you cloned the repository and installed the htc framework in editable mode, you can create a .env file in the repository 
root (more precisely, at /home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc/.env) and fill it with variables, for example:
export PATH_HTC_NETWORK="/path/to/your/network/dir"
export PATH_Tivita_my_dataset="~/htc/Tivita_my_dataset:shortcut=my_shortcut"

Environment variables:

Datasets:
<htc.utils.Datasets.DatasetAccessor object at 0x7f5947812dd0>

Other directories:
[WARNING][htc] Could not find the environment variable PATH_HTC_RESULTS so that a results directory will not be available         settings.py:503
(scripts which use settings.results_dir will crash)                                                                                              
None
[WARNING][htc] Could not find an intermediates directory, probably because no data directory was found                            settings.py:460
None
src_dir=/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc
htc_package_dir=/home/tay/Code/ms-seg/env/lib/python3.10/site-packages/htc

System:
Collecting environment information...
PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.27.2
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.2.0-36-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             24
On-line CPU(s) list:                0-23
Vendor ID:                          GenuineIntel
Model name:                         12th Gen Intel(R) Core(TM) i9-12900K
CPU family:                         6
Model:                              151
Thread(s) per core:                 2
Core(s) per socket:                 16
Socket(s):                          1
Stepping:                           2
CPU max MHz:                        5200.0000
CPU min MHz:                        800.0000
BogoMIPS:                           6374.40
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities
Virtualisation:                     VT-x
L1d cache:                          640 KiB (16 instances)
L1i cache:                          768 KiB (16 instances)
L2 cache:                           14 MiB (10 instances)
L3 cache:                           30 MiB (1 instance)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-23
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] efficientnet-pytorch==0.7.1
[pip3] numpy==1.26.2
[pip3] pytorch-lightning==2.1.2
[pip3] segmentation-models-pytorch==0.3.3
[pip3] torch==2.1.1
[pip3] torchmetrics==1.2.1
[pip3] torchvision==0.16.1
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h6d00ec8_46342  
[conda] mkl-service               2.4.0           py311h5eee18b_1  
[conda] mkl_fft                   1.3.6           py311ha02d727_1  
[conda] mkl_random                1.2.2           py311ha02d727_1  
[conda] numpy                     1.24.3          py311h08b1b3b_1  
[conda] numpy-base                1.24.3          py311hf175353_1  
[conda] numpydoc                  1.5.0           py311h06a4308_0  
[conda] pytorch                   2.0.1           cpu_py311h6d93b4c_0  
JanSellner commented 8 months ago
tensor([], device='cuda:0', dtype=torch.int64)
tensor([], device='cuda:0', dtype=torch.int64)

It looks like that you have a batch of images with only invalid pixels, i.e. no annotated region is left. If there is no label left, it is not possible to calculate the loss.

You are using the full HeiPorSPECTRAL dataset but the labels from our semantic segmentation task:

'label_mapping': 'htc.settings_seg>label_mapping',  

This label mapping does not include bone, cartilage and bile_fluid and if only images with annotations from those images are part of the batch, no valid label remains (because the annotated regions of those labels are not used).

You have two options depending on what you want to do:

  1. If you want to keep the label mapping, remove the images from you data specification which contain only one of the unused classes, i.e. something like set(path.annotated_labels()) - set(["bone", "cartilage", "bile_fluid"]) and if that set is empty, remove the image from the specs.
  2. If you want to train something with all the labels from the HeiPorSPECTRAL dataset, use a different label mapping. You can define your own with the labels you want or use 'htc.tissue_atlas.settings_atlas>label_mapping' which uses all 20 classes.
alfieroddan commented 8 months ago

Thank you Jan!