Pretrained models require dataset?

alfieroddan commented 9 months ago

:bug: Bug

Using the ModelImage.pretrained_model static method calls MetricAggregation which requires a dataset with specific images.

Description

Running the following with only the exemplary dataset:

run_folder = "2022-02-03_22-58-44_generated_default_model_comparison"  # HSI model

model = ModelImage.pretrained_model(
    model="image", run_folder=run_folder, n_channels=100, n_classes=19
)

Returns following error message:

Traceback (most recent call last):
  File "/Users/alfie/Documents/ms-seg/snippets/example_prediction.py", line 18, in <module>
    model = ModelImage.pretrained_model(model="image", run_folder=run_folder, n_channels=100, n_classes=19)
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/models/common/HTCModel.py", line 453, in pretrained_model
    return cls(config, **model_kwargs)
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/models/common/HTCModel.py", line 25, in __call__
    obj.__post__init__()
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/models/common/HTCModel.py", line 142, in __post__init__
    self._load_pretrained_model()
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/models/common/HTCModel.py", line 275, in _load_pretrained_model
    agg = MetricAggregation(df_val, config=config)
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/models/common/MetricAggregation.py", line 45, in __init__
    [DataPath.from_image_name(name).image_name_typed() for name in self.df["image_name"]]
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/models/common/MetricAggregation.py", line 45, in <listcomp>
    [DataPath.from_image_name(name).image_name_typed() for name in self.df["image_name"]]
  File "/Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/tivita/DataPath.py", line 1182, in from_image_name
    assert match is not None, (
AssertionError: Could not find the path for the image P041#2019_12_14_12_00_16 (len(DataPath._local_cache()) = 5756, len(DataPath._network_cache()) = 0)

Is there a way to load pretrained model without running metric checks?

Dataset

Exemplary reduced dataset with only PO86 and PO93.

Environment

htc framework

version: 0.0.13
url: https://github.com/imsy-dkfz/htc
git commit: f5595beda515800b5273e42f9b42e42545447a22

User settings: No user settings found. If you want to use your user settings to specify environment variables, please create the file /Users/alfie/Library/Application Support/htc/variables.env and add your environment variables, for example: export PATH_HTC_NETWORK="/path/to/your/network/dir" export PATH_Tivita_my_dataset="~/htc/Tivita_my_dataset:shortcut=my_shortcut"

.env settings: No .env file found. If you cloned the repository and installed the htc framework in editable mode, you can create a .env file in the repository root (more precisely, at /Users/alfie/Documents/ms-seg/env/lib/python3.10/site-packages/htc/.env) and fill it with variables, for example: export PATH_HTC_NETWORK="/path/to/your/network/dir" export PATH_Tivita_my_dataset="~/htc/Tivita_my_dataset:shortcut=my_shortcut"

Environment variables:

Datasets: <htc.utils.Datasets.DatasetAccessor object at 0x14fb11450>

Other directories: [WARNING][htc] Could not find the environment variable PATH_HTC_RESULTS so that a results directory will not be available (scripts settings.py:503 which use settings.results_dir will crash)
None [WARNING][htc] Could not find an intermediates directory, probably because no data directory was found settings.py:460 None src_dir=env/lib/python3.10/site-packages/htc htc_package_dir=env/lib/python3.10/site-packages/htc

System: Collecting environment information... PyTorch version: 2.1.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 13.4.1 (arm64) GCC version: Could not collect Clang version: 14.0.3 (clang-1403.0.22.14.1) CMake version: version 3.27.2 Libc version: N/A

Python version: 3.10.13 (main, Aug 24 2023, 22:36:46) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime) Python platform: macOS-13.4.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1

Versions of relevant libraries: [pip3] efficientnet-pytorch==0.7.1 [pip3] numpy==1.26.2 [pip3] pytorch-ignite==0.4.11 [pip3] pytorch-lightning==2.1.0 [pip3] segmentation-models-pytorch==0.3.3 [pip3] torch==2.1.1 [pip3] torchmetrics==1.2.1 [pip3] torchvision==0.16.1 [conda] Could not collect

JanSellner commented 9 months ago

Oh, you are right. That should not happen.

The code which uses the MetricAggregation class is used to find the best fold (in terms of highest DSC). I have to make that work even if the dataset is not available.

In the meantime, you can specify the fold name explicitly as a workaround to avoid this part of the code:

run_folder = "2022-02-03_22-58-44_generated_default_model_comparison"  # HSI model
fold_name = "fold_P041,P060,P069"  # For example

model = ModelImage.pretrained_model(
    model="image", run_folder=run_folder, fold_name=fold_name, n_channels=100, n_classes=19
)

Thank you for letting us know!

alfieroddan commented 9 months ago

Not a problem! Thank you for the temporary solution.

JanSellner commented 9 months ago

This is fixed now in the latest master by e6c4e92f4a7f089e71e260abdfa848c3626ea2db

IMSY-DKFZ / htc