fepegar / torchio

Medical imaging toolkit for deep learning
https://torchio.org
Apache License 2.0
2.08k stars 239 forks source link

get_subjects_from_batch has a hick-up with int metadata #1130

Closed KonoMaxi closed 10 months ago

KonoMaxi commented 11 months ago

Is there an existing issue for this?

Bug summary

Hello Fernando, I discovered a little bug last week.

The documentation suggests adding metadata to my subjects, like a name and an age https://torchio.readthedocs.io/data/subject.html And it works nicely!

But when I reworked my transformations, I noticed, that get_subjects_from_batch has problems with metadata. It seems to discard it entirely. In addition int-values (like age) even cause get_subjects_from_batch to crash!

I don't know if I'm using SubjectsDataset.from_batch wrong, but at first sight this looks like a bug to me. I'll add a PR with two failing test-cases.

As I'm unsure what the intended behavior is, I'll wait for your feedback before I take a deeper look at the code myself.

Regards, Max

Code for reproduction

def test_subjects_from_batch_with_string_metadata(self):
        subject_c_with_string_metadata = tio.Subject(
            name="John Doe",
            label=tio.LabelMap(self.get_image_path('label_c', binary=True)),
        )

        dataset = tio.SubjectsDataset(4 * [subject_c_with_string_metadata])
        loader = torch.utils.data.DataLoader(dataset, batch_size=4)
        batch = tio.utils.get_first_item(loader)
        subjects = tio.utils.get_subjects_from_batch(batch)
        assert isinstance(subjects[0], tio.Subject)
        assert "label" in subjects[0]
        assert "name" in subjects[0]

def test_subjects_from_batch_with_int_metadata(self):
        subject_c_with_int_metadata = tio.Subject(
            age=45,
            label=tio.LabelMap(self.get_image_path('label_c', binary=True)),
        )
        dataset = tio.SubjectsDataset(4 * [subject_c_with_int_metadata])
        loader = torch.utils.data.DataLoader(dataset, batch_size=4)
        batch = tio.utils.get_first_item(loader)
        subjects = tio.utils.get_subjects_from_batch(batch)
        assert isinstance(subjects[0], tio.Subject)
        assert "label" in subjects[0]
        assert "age" in subjects[0]

Actual outcome

AssertionError on test_subjects_from_batch_with_string_metadata RuntimeError on test_subjects_from_batch_with_int_metadata

Error messages

RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.
AssertionError: assert 'name' in Subject(Keys: ('label',); images: 1)

Expected outcome

The original subject from the first line of each test

System info

Platform:   Linux-5.15.0-88-generic-x86_64-with-glibc2.35
TorchIO:    0.19.3
PyTorch:    2.1.1+cu121
SimpleITK:  2.3.1 (ITK 5.3)
NumPy:      1.26.2
Python:     3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0]
KonoMaxi commented 11 months ago

https://github.com/fepegar/torchio/pull/1131

fepegar commented 10 months ago

Thanks for reporting and adding the tests, @KonoMaxi. I've added a fix and merged your PR. These errors shouldn't happen in v0.19.4.

fepegar commented 10 months ago

@all-contributors please add @KonoMaxi for bug

allcontributors[bot] commented 10 months ago

@fepegar

I've put up a pull request to add @KonoMaxi! :tada:

KonoMaxi commented 10 months ago

Glad I could help and thank you for fixing this bug. Looking forward to using metadata in my SubjectsDataset again :-)