bgruening / galaxytools

:microscope::books: Galaxy Tool wrappers
MIT License
115 stars 220 forks source link

[WIP] Add tool for analysing images using Bioimage AI models #1391

Open anuprulez opened 4 months ago

anuprulez commented 4 months ago

images

Test files: https://bioimage.io/#/?id=10.5281%2Fzenodo.5764892 https://zenodo.org/api/records/6647674

Remaining tasks:

bgruening commented 2 weeks ago

@anuprulez is this still WIP or can we merge it. @beatrizserrano needs it :)

kostrykin commented 1 week ago

Guys, my impression of the comments above is that there has been some confusion. So in order to clear things up, I want to try to summarize the main concerns:

Concern 1: Support of TIFF/PNG

@bgruening was addressing the input file format, which currently is only NPZ. For those people who are the target audience of this tool, this isn't a well-established standard file format such as TIFF or PNG. However, @anuprulez explained the conversion to TIFF or PNG of the output files, pointing out that it is not straightforward.

Lets consider the questions of the input file formats and the output file formats separately.

1.1) Inputs: The NPY/NPZ formats are more general than TIFF and PNG, since they can store arbitrary numpy arrays, with arbitrary data types (even mixed) and number of dimensions. Thus, conversion from TIFF/PNG to NPY/NPZ should be straightforward. Given the concerns above, I think the tool wrapper really should also accept TIFF and PNG input files and do this conversion automatically. This should be as simple as something like this:

im = imread('tiff or png image file path')
np.save('input.npy', im)  # to produce an NPY
np.savez('input.npz', im)  # to produce an NPZ

1.2) Outputs: It was pointed out that the intention of outputting the originally predicted arrays as NPZ was that no information is lost. However, I'm wondering whether converting the data to TIFF is even capable of losing information? My gut feeling is that using TIFF as the output file format should be safe, at least as long as we restrict the wrapper to NPY instead of NPZ (see below).

Concern 2: NPY or NPZ?

Please keep in mind that NPY is "the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk", and NPZ is "the standard format for persisting multiple NumPy arrays on disk", which can be compressed, but not necessarily (docs). The key difference here is that NPY is for single arrays, NPZ is for multiple arrays.

At this point I'm somewhat confused. I hope that @FynnBe can maybe add something regarding the following two concerns:

2.1) What happens if the NPZ contains more than one array? Do the bioimage.io models process them independently and yield another NPZ, which contains predictions for each array in the NPZ? Or do the models bluntly fail to process an NPZ with more than one array?

2.2.) As far as I understand the comment made by @FynnBe (link), the bioimage.io models are only guaranteed to work with NPY inputs. Doesn't that mean, that the wrapper should actually take/produce NPY instead of NPZ? If this is right, then we should forget about NPZ and restrict our discussions here to PNG, TIFF, and NPY.


cc @beatrizserrano