Closed dkazanc closed 4 weeks ago
Thanks for @'ting me :slightly_smiling_face:, here's some of my initial thoughts:
saved_data_type
would work, but something which does spring to mind is that we might need to be mindful how this would be implemented. Currently, I think we have that global stats calculation + wrapper creation is triggered by something after the intermediate dataset saving wrapper (ie, save_to_images
has a glob_stats: True
parameter, and this comes after the intermediate dataset saving wrapper that is automatically inserted on the user's behalf by httomo)
saved_data_type
, the description of it given sounds like it would be on the method before the intermediate dataset saving wrapper, is that right? But then this parameter can also trigger global stats calculation, which is something that is currently done after the intermediate dataset saving wrapper, in save_to_images
saved_data_type
goes on the method before the intermediate dataset saving wrapper, but global_stats
can come on methods after the intermediate dataset saving wrapper -> at first glance it sounds confusing (maybe it's the right design or maybe it's not, I'm just pointing out it sounds a bit odd at a first glance)improve-intermediate-file-performance
branch when compressing the chunks when saving the recon data for parallel runs.Thanks Yousef, I think it needs more thinking/discussion.
I just realised that we actually got the parameter save_result_default
in the library file which can control the result of reconstruction saved, for instance. This is in addition to saving the last result of the pipeline, which make things pretty dubious. I think we need to look into that before the release. Just do this:
save_result_default
controls the saving So I looked into this:
save_result_default: True
in the library file. save_result_default
works as expected and save_result: False
cancels it when used in the pipeline. This all mean that nothing should be done here actually, as one can cancel the saving of the reconstructed volume, if needs to. I'm also OK if the result of the recon is saved by default for now. If there will be a request to turn it off it can be done easily using save_result_default: False
.
Having some doubts that we should save the result of the last method automatically into the hdf5 file.
More on the second point. I think it would be more sensible if we auto-generate a template for reconstruction methods with the flag
save_result: False
added to it and let the user decide if saving into hdf5 is needed.With respect to the precision it worth to think about it. Basically we don't need the result of the reconstruction to be saved but rescale_to_int instead. This makes me think if we need an additional HTTomo parameter, e.g.
saved_data_type: uint16
which will trigger global stats collections and then rescaling on the given array.The logic behind it is the following: if we're saving the rescaled data into tiffs and the resulting hdf5 should be in uint16 as well, then we can save some time and space on saving floating point arrays (if its needed at all). Secondly the reconstructed hdf5 files in uint16 can be well compressed I believe. Blosc should work quite well on them as they possibly sparser than the projection data, especially after denoising!
Ideas? @yousefmoazzam