SBU-BMI / wsinfer

🔥 🚀 Blazingly fast pipeline for patch-based classification in whole slide images
https://wsinfer.readthedocs.io
Apache License 2.0
57 stars 10 forks source link

MIRAX (.mrxs) support #189

Closed xalexalex closed 1 year ago

xalexalex commented 1 year ago

Hello, I successfully used WSInfer on some projects using .tif WSIs, but now I am trying to use it on .mrxs files and it doesn't seem to work.

The issue is that mrxs files are shipped in a peculiar format (a slidename.mrxs file plus a slidename folder) and WSInfer errors out in wsi.py/_validate_wsi_directory() raising DuplicateFilePrefixesFound.

I tried fixing it in a naive way (changing the glob from "" to "." to avoid picking up the folders) and it seems to work (will send PR ASAP). However it's not elegant and it opens up some space for discussion: why do we have to input a wsi_dir? Wouldn't it be cleaner to use something like `-i wsi_dir/that can be used more flexibly, e.g.-i wsi_dir/*.mrxs` and can handle directories holding other files in addition to wsis?

kaczmarj commented 1 year ago

thanks very much @xalexalex . i had not used .mrxs files before so thanks for bringing them up. we're very happy to support them of course, and that's great that it seemed to work after globbing only files.

why do we have to input a wsi_dir? Wouldn't it be cleaner to use something like -i wsi_dir/* that can be used more flexibly, e.g. -i wsi_dir/*.mrxs and can handle directories holding other files in addition to wsis?

it's a good point and we should definitely discuss this. allowing the user to specify the slide files would be more flexible, and in most cases, it would simply be -i wsi_dir/*.ext. one concern is that some users might not know about globbing and this API might confuse those users. another concern is that if one has many files that they are globbing (a lot of files), then the command line will throw an error. however, a major advantage of specifying the slide files directly is if one wants to batch process files. this is particularly useful if one has multiple gpus / nodes and wants to send off multiple wsinfer runs of files that live in the same directory -- one can specify the batch on the command line. and as you write, this also has the advantage of handling different file types, like .mrxs.

a potential implementation that preserves the current simple API but adds flexibility for those who want it is having an option to use a directory only (and use all files within it) or an option to pass slide file paths like with a glob on the command line. these two options would be mutually exclusive. however, this might cause confusion for the user because there would now be two ways of running the wsinfer run command.

what do you think @xalexalex ? and anyone else reading along, please do offer your feedback!

xalexalex commented 1 year ago

one concern is that some users might not know about globbing and this API might confuse those users.

True, but I think this is a remote possibility. I suppose anyone using a command-line interface should be familiar with basic globbing (i.e. dir/*.ext, nothing fancier).

another concern is that if one has many files that they are globbing (a lot of files), then the command line will throw an error.

This is also true but in my opinion very remote. And if anyone had tens of thousands of WSIs to process at once, I think this would fall in the use case below (batch/distributed processing) so it's actually an argument in favor of accepting files as arguments.

I would mostly be concerned about breaking the current interface. However, since WSInfer is in very active development, I would prefer breaking the interface sooner rather than later.

a potential implementation that preserves the current simple API but adds flexibility for those who want it is having an option to use a directory only (and use all files within it) or an option to pass slide file paths like with a glob on the command line. these two options would be mutually exclusive. however, this might cause confusion for the user because there would now be two ways of running the wsinfer run command.

For simplicity's sake (both in code maintenance and user-friendliness) I would be against supporting both options, but if you decide to go that way, then two mutually exclusive flags are definitely the way to go (e.g. --input-dir and --input-files).

Thanks! Alessandro

kaczmarj commented 1 year ago

Thanks Alessandro. Let me discuss this change with our group. I agree that both of my concerns are remote possibilities, and this would help in batch processing.Best,JakubOn Sep 23, 2023, at 04:48, Alessandro Caputo @.***> wrote:

one concern is that some users might not know about globbing and this API might confuse those users.

True, but I think this is a remote possibility. I suppose anyone using a command-line interface should be familiar with basic globbing (i.e. dir/*.ext, nothing fancier).

another concern is that if one has many files that they are globbing (a lot of files), then the command line will throw an error.

This is also true but in my opinion very remote. And if anyone had tens of thousands of WSIs to process at once, I think this would fall in the use case below (batch/distributed processing) so it's actually an argument in favor of accepting files as arguments. I would mostly be concerned about breaking the current interface. However, since WSInfer is in very active development, I would prefer breaking the interface sooner rather than later.

a potential implementation that preserves the current simple API but adds flexibility for those who want it is having an option to use a directory only (and use all files within it) or an option to pass slide file paths like with a glob on the command line. these two options would be mutually exclusive. however, this might cause confusion for the user because there would now be two ways of running the wsinfer run command.

For simplicity's sake (both in code maintenance and user-friendliness) I would be against supporting both options, but if you decide to go that way, then two mutually exclusive flags are definitely the way to go (e.g. --input-dir and --input-files). Thanks! Alessandro

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>