DiamondLightSource / httomo

High-throughput tomography pipeline
https://diamondlightsource.github.io/httomo/
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Autosave of the last method result into the file #283

Closed dkazanc closed 4 weeks ago

dkazanc commented 2 months ago

Having some doubts that we should save the result of the last method automatically into the hdf5 file.

  1. The user might not need the hdf5 file and rather prefer tiffs or/and a binary file (TODO) instead
  2. The reconstruction (being the last method) result is always saved as floating point array into hdf5 while it can be uint16 after rescale_to_int performed on it. Float32 is usually a redundant precision for further analysis anyway (e.g. segmentation). The users rescale themselves to run some Otsu stuff on it.

More on the second point. I think it would be more sensible if we auto-generate a template for reconstruction methods with the flag save_result: Falseadded to it and let the user decide if saving into hdf5 is needed.

With respect to the precision it worth to think about it. Basically we don't need the result of the reconstruction to be saved but rescale_to_int instead. This makes me think if we need an additional HTTomo parameter, e.g. saved_data_type: uint16 which will trigger global stats collections and then rescaling on the given array.

The logic behind it is the following: if we're saving the rescaled data into tiffs and the resulting hdf5 should be in uint16 as well, then we can save some time and space on saving floating point arrays (if its needed at all). Secondly the reconstructed hdf5 files in uint16 can be well compressed I believe. Blosc should work quite well on them as they possibly sparser than the projection data, especially after denoising!

Ideas? @yousefmoazzam

yousefmoazzam commented 2 months ago

Thanks for @'ting me :slightly_smiling_face:, here's some of my initial thoughts:

dkazanc commented 2 months ago

Thanks Yousef, I think it needs more thinking/discussion. I just realised that we actually got the parameter save_result_default in the library file which can control the result of reconstruction saved, for instance. This is in addition to saving the last result of the pipeline, which make things pretty dubious. I think we need to look into that before the release. Just do this:

dkazanc commented 4 weeks ago

So I looked into this:

This all mean that nothing should be done here actually, as one can cancel the saving of the reconstructed volume, if needs to. I'm also OK if the result of the recon is saved by default for now. If there will be a request to turn it off it can be done easily using save_result_default: False.