NEUBIAS / training-resources

Resources for teaching/preparing to teach bioimage analysis
https://neubias.github.io/training-resources
Other
45 stars 21 forks source link

Image data formats: refactor into one activity per format #720

Open tischi opened 1 week ago

tischi commented 1 week ago

Recent teaching experiences showed that handling such "monster activities" (https://neubias.github.io/training-resources/image_file_formats/index.html#open) can be challenging for trainers, students and maintainers, because

  1. one can get lost
  2. especially if one does not want to show everything
  3. not all sub-activities can be implemented in all platforms and not all sub-activities may be interesting for all audiences
  4. some sub-activities would benefit from additional explanations in the preface, which would explode the preface
  5. specifically here, I would, e.g., like to add opening a "movie" sequence of JPEG files and not everyone may find this relevant or interesting, and also I would like to add some other commercial formats that are occurring frequently at EMBL (and again, not everyone may find this relevant).

I would therefore like to inquire how you feel about splitting this up into one activity per image data format? A disadvantage could be that if we find general new tricks how to best open such image data we would need to change this in several activities.

ping @manerotoni @k-dominik @AnniekStok

tischi commented 1 week ago

In fact, another reason to refactor this is that in python saving in different formats may be less easy and require various dependencies. In Fiji everything is conveniently bundled under File > Save as..., this will be less simple in other platforms.

manerotoni commented 1 week ago

Hello @tischi, I never taught the module, but quickly looking at it, the activities are overwhelming. It will not harm to separate those. It could be that at the end we have separate modules for 'simpler' data sets and more complex data sets. This should not affect the activities.

AnniekStok commented 1 week ago

Hi @tischi,

I agree the activity looks a bit complicated and splitting it up would make it easier to pick out specific parts that you want to teach. Including different microscope formats and how to directly concatenate a movie from a folder would be very useful. As you said splitting the activities by file format will have the benefit that we can pick the specific reader/writer method needed for that type of data so it should be easier. We did something similar for a Napari workshop recently. Just wondering, right now the module appears to serve multiple purposes, which in combination with all different file formats can make it a bit overwhelming (and possibly very large). Would it make sense to split it further into separate consecutive modules or would it fragment the flow too much?

tischi commented 1 week ago

@AnniekStok

So you are proposing to separate opening and inspecting image pixel data and opening and inspecting image metadata into two different modules? I am not sure about this, could you please elaborate a bit more why you think this would be good?

Regarding the saving being a different module: I think we all agree (see https://github.com/NEUBIAS/training-resources/issues/719).

AnniekStok commented 1 week ago

I was just wondering, in part to simplify the module and in part because the concept of metadata feels like it can be a topic of its own, depending on how deep you want to go into it. We recently tested a microscope that saved the metadata separately from the tif output (I actually did not like that at all, what if you lose the metadata file or accidentally move it somewhere else?). But occasionally one might want to read only the metadata. However, I also understand it is convenient to keep it together with opening different image formats since each format has its own metadata structure.

tischi commented 1 week ago

Good point....I thought didactically it could be good to make the point that the mapping of the binary pixel data into an XYZCT space needs some metadata, e.g. which TIFF plane corresponds to what. If that information is missing or wrong it will be hard to read the data in a meaningful way, at all.

But the microscopy settings metadata is a different and indeed something that could be looked at separately.

Maybe we could restrict this module to "essential metadata" such as the XYZCT mapping and pixel spatial calibration?

And then refer to a future module for microscope related metadata?

AnniekStok commented 1 week ago

Yes I see your point. Depending on the reader the image may be displayed correctly automatically, but if not, it is helpful to know where to find this information and how to apply it (for which we can refer in part to the spatial calibration module). I think restricting the module to the essential metadata and referring to a separate module that dives into microscopy metadata could work!

tischi commented 1 week ago

In am still not sure. At least in Fiji, reading the metadata is as easy as checking one box in the Bio-Formats Importer. Thus, teaching how to open it does not present a lot of overhead. Of course, then really digging through all the metadata could take a lot of time, but I don't think we need to do this here. We can just tell the students: "That's how you can access metadata easily, good luck finding there what you need".

tischi commented 1 week ago

@manerotoni @AnniekStok

Which image formats should we teach to open?

I think it would be important to cover some of the typical complex cases, e.g.

What do you think?

Please note that my current plan is to teach "big image data formats" like OME-Zarr and XML/HDF5 in another module, because the need the additional concept of a resolution pyramid.

Please also note that there my current plan is to have a whole module about OME-TIFF, which also can be multi-resolution, can contain several images, and as such is quite complex and too much for an overview module, imho.

manerotoni commented 1 week ago

I would keep the metadata at a minimum (axis notation and pixel size, Dt). The rest could be addressed in another module if necessary. The rest is often how the image has been acquired (laser power, filters etc.). I think it is also nothing wrong to check metadata using bio-formats/Fiji and then use this info in python.

Didactically the examples you picked are fine. I would have something with tiles too, just the opening. Stitching does not fit here and is a much needed separate module. The fact that we plan now to separate the activities per data format is good (do czi and lif separated). The data format is a little facility/institute specific and depends on what instruments you have.

I was not aware the ome-tiff can do pyramids :-)

AnniekStok commented 1 week ago

I agree those examples are good! With czi / lif / (ndi) it could be nice to explore the different options for splitting channels and/or timepoints using the bioformats importer. For concatenating a movie from a folder containing z-stacks, it might be nice to show the virtual stack option and how to set the hyperstack correctly.

tischi commented 1 week ago

I think I would also like to add one activity for the ilastik hdf5 format, bc HDF5 in general is important to know.

tischi commented 1 week ago

Hi,

I started the refactoring by implementing an activity to open a CZI image (see above commit).

May I ask for your help with the other file formats? It is relatively straightforward, just copy what I did for the CZI format.

I don't think we need PRs for this, simply push to master with commit messages referring to this issue (#720), e.g.:

git commit -m "Add activity to open CZI image, #720"

@felixS27 could you please modified this as mentioned in the TODO?

@AnniekStok could you be motivated to add activities to open some of the other formats that we agree upon?


Regarding the TIFF series: I actually want to change this and not do a movie but EM volume slices, because this is very relevant for EMBL here.

tischi commented 1 week ago

I now also added a TIFF series activity.

@felixS27 could you please look into implementing the corresponding python activity? I do not know whether this could also be done with bioio. If not, maybe tifffile offers something? You could ask on the forum if you do not find anything; I am positive that there should be something...

AnniekStok commented 1 week ago

Hi, yes I could try to work on that (maybe only after I am back from traveling though). I can do lif and tif, but I do not have any vsi files, is there sample data for that? Do we actually want to include activities opening different formats in napari as well?

felixS27 commented 10 hours ago

Hi,

I started the refactoring by implementing an activity to open a CZI image (see above commit).

May I ask for your help with the other file formats? It is relatively straightforward, just copy what I did for the CZI format.

I don't think we need PRs for this, simply push to master with commit messages referring to this issue (#720), e.g.:

git commit -m "Add activity to open CZI image, #720"

@felixS27 could you please modified this as mentioned in the TODO?

@AnniekStok could you be motivated to add activities to open some of the other formats that we agree upon?

Regarding the TIFF series: I actually want to change this and not do a movie but EM volume slices, because this is very relevant for EMBL here.

@tischi sure, I will do this. Should I then also do this for the other files formats in parallel?

felixS27 commented 10 hours ago

I now also added a TIFF series activity.

@felixS27 could you please look into implementing the corresponding python activity? I do not know whether this could also be done with bioio. If not, maybe tifffile offers something? You could ask on the forum if you do not find anything; I am positive that there should be something...

@tischi I will look into this.

tischi commented 10 hours ago

@tischi sure, I will do this. Should I then also do this for the other files formats in parallel?

Yes please, you could also start refactoring and the @AnniekStok could fill in the ImageJ implementations.

Thanks!!!