[Question] HTC dataset generation

alfieroddan commented 9 months ago

:question: Question

Just a little confused about creating simple one fold dataset.

Could I ask for some help?

Description

So just looking to create a simple dataset with 1 fold.

Using the tutorial network training script (slightly modified) I run:

from htc import (
    DataPath,
    SpecsGeneration,
    settings,
    settings_seg,
    LabelMapping
)
from htc.utils.paths import filter_semantic_labels_only
from pathlib import Path
import itertools
import numpy as np

class SpecsGenerationThoracic(SpecsGeneration):
    def __init__(self):
        # Unique name of the resulting specs file
        super().__init__(name="Test-dataspec")

        # We will divide the data into an untuched test set and the rest will be divided into 2 folds
        self.subjects_train = ["P087", "P090", "PO92", "PO94", "PO95"]
        self.subjects_val = ["PO88", "PO93"]
        self.subjects_test = ["P086", "PO89", "P091", "P096"]

    def generate_folds(self) -> list[dict]:
        # We only need a subset of the available images: those from our selected subjects (filter_subjects_*))
        data_dir = settings.data_dirs["HeiPorSPECTRAL"]
        filter_labels = lambda p: set(p.annotated_labels(annotation_name="all"))
        filter_subjects_test = lambda p: p.subject_name in self.subjects_test
        filter_subjects_val = lambda p: p.subject_name in self.subjects_val
        filter_subjects_train = lambda p: p.subject_name in self.subjects_train

        # Untouched test set
        paths_test = list(DataPath.iterate(data_dir, filters=[filter_subjects_test, filter_labels]))
        imgs_test = [p.image_name() for p in paths_test]

        # Validation set
        paths_val = list(DataPath.iterate(data_dir, filters=[filter_subjects_val, filter_labels]))
        imgs_val = [p.image_name() for p in paths_val]

        # Training set
        paths_train = list(DataPath.iterate(data_dir, filters=[filter_subjects_train, filter_labels]))
        imgs_train = [p.image_name() for p in paths_train]

        data_specs = []

        fold_specs = {
            "fold_name": f"fold_{0}",
            "train": {
                "image_names": imgs_train,
            },
            "val": {
                "image_names": imgs_val,
            },
            "test": {
                "image_names": imgs_test,
            },
        }

        data_specs.append(fold_specs)

        return data_specs

SpecsGenerationThoracic().generate_dataset(Path("hs_tools"))

The problem is that the validation set is empty:

"val": {
            "image_names": []
        },

Is there a simple way to create a 1 fold dataset with train, val, test with this code base?

JanSellner commented 9 months ago

Yes, you need to change the subject name for your validation pigs:

-      self.subjects_val = ["PO88", "PO93"]
+      self.subjects_val = ["P088", "P093"]

It is 0 not O ;-)

alfieroddan commented 9 months ago

Oh how silly of me! Thank you so much.

IMSY-DKFZ / htc

[Question] HTC dataset generation #18

:question: Question

Description