DIAGNijmegen / pathology-whole-slide-data

A package for working with whole-slide data including a fast batch iterator that can be used to train deep learning models.
https://diagnijmegen.github.io/pathology-whole-slide-data/
Apache License 2.0
86 stars 24 forks source link

create_batch_iterator that associates files with exact matching #19

Closed michelbotros closed 2 years ago

michelbotros commented 2 years ago

For some file keys I am running into problems with associations. For example when the following files are in my data.yml:

DATASET5-12345_HE-I.tiff DATASET5-12345_HE-II.tiff DATASET5-12345_HE-III.tiff

In this case associate_files() with exact matching is required. Is it possible to add the functionality of requiring exact matching to associate files to the create_batch_iterator function?

At some point it calls associate_files() in https://github.com/DIAGNijmegen/pathology-whole-slide-data/blob/c4ea347c1b25cc212e26a4f8f8b5bdcc27ae5688/wholeslidedata/source/associations.py#L55

It's default setting for requiring an exact match is False. I think I'd be useful to be able to call create_batch_iterator with requiring an exact match.

Let me know what you think about adding this or if you know a workaround solution!

martvanrijthoven commented 2 years ago

Dear Michel Botros,

Yes, this feature is already implemented. You can add it to your user config e.g.,:

wholeslidedata:
    default:
        associations:
            exact_match: True

Please let me know if you still encounter problems or have any other questions.

Best wishes, Mart

michelbotros commented 2 years ago

Dear Mart,

Oh that's great. Thank you! I'll let you know if I have more questions. So far I like working with this framework :)

Best,

Michel

martvanrijthoven commented 2 years ago

Dear Michel,

Really great to hear that you are liking the framework, thank you!

Best wishes, Mart