An option to pass preview to the method

For distortion correction method, we also need to apply previewing in order to compensate in coefficients for the cropped data.

At this point I don't know what exactly we need to do in order to pass preview to the method. One possible choice is to use a side output to the preview dictionary in the loader, e.g. ${{loader.side_outputs.preview}} as a part of distortion correction template. Would that be possible? We probably then need to attach id to the loader. Another option is to "secretly" pass the preview value into the method behind the scenes and hide preview parameter in the template. This would need some modifications in the wrappers. But being more explicit here is probably a better way?

One possible choice is to use a side output to the preview dictionary in the loader, e.g. ${{loader.side_outputs.preview}} as a part of distortion correction template. Would that be possible? We probably then need to attach id to the loader.

Unfortunately, I think this particular option can't be done without some changes to the loader. Side outputs are a thing that method wrappers can handle, but neither the loader StandardTomoLoader nor the loader wrapper StandardLoaderWrapper is a method wrapper*, so loader's in general currently don't have functionality to produce side outputs.

This doesn't mean it's not an option, just wanted to point out that it's probably more complicated than it initially seems.

*FYI, when I say the loader isn't a method wrapper, I mean that neither StandardTomoLoader nor StandardLoaderWrapper implement methods defined on the MethodWrapper protocol, like get_side_outputs(): https://github.com/DiamondLightSource/httomo/blob/351e028a5298bc0b619eab7280c6b07d2dccd52c/httomo/runner/method_wrapper.py#L175-L179

what do you think is preferable in this case explicit or implicit? In principle, preview argument might need to go into other methods at some point. For instance, the Vo centering method relies on the single index given to the method itself. I think if mid is given or a number, we calculate such index with respect to the global data, not the previewed subset. I'm not 100% sure that this is the right way as may be it should be relative to the preview mid instead. If that is the case, then we need preview parsed into the method. Other methods might also needed at some point, e.g., I was thinking about the autocropping.

So, do you think it would be nicer to implement it more explicit, i.e., via side output?

I'm still thinking about what would be the best way to provide the preview information to method wrappers, I'll get back to you on that at some point.

For now though, I wanted to address this:

For instance, the Vo centering method relies on the single index given to the method itself. I think if mid is given or a number, we calculate such index with respect to the global data, not the previewed subset.

I don't think this is correct; off the top of my head, it doesn't make much sense for the rotation wrapper to calculate mid relative to the global data, because if the data has been cropped/previewed, then there is no access to the global data -> if there's no access to the global data, then why would mid be relative to global data indices?

I checked the code, it uses the shape of the block to calculate the middle index in the detector_y dimension: https://github.com/DiamondLightSource/httomo/blob/351e028a5298bc0b619eab7280c6b07d2dccd52c/httomo/method_wrappers/rotation.py#L73-L76

where dataset.shape is the shape of the block's numpy array, not of the global data: https://github.com/DiamondLightSource/httomo/blob/351e028a5298bc0b619eab7280c6b07d2dccd52c/httomo/base_block.py#L103-L106

I checked this by running a pipeline with find_center_vo with two different preview values, and the middle index changes relative to the number of previewed sinogram slices.

With 200 slices previewing:

preview:
  detector_y:
    start: 10
    stop: 210

the middle index is found to be 99 (the middle between index 0 and index 199 in the previewed subset):

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@cs05r-sc-gpu05-16 httomo (improve-paganin-memory-estimation)]$ mpirun -n 4 python -m httomo run /dls/i12/data/2022/nt33730-1/rawdata/119647.nxs tests/samples/pipeline_template_examples/pipeline_gpu1.yaml /dls/tmp/twi18192/rotation-wrapper-ind-handling/
Pipeline has been separated into 2 sections
See the full log file at: /dls/tmp/twi18192/rotation-wrapper-ind-handling/24-09-2024_10_13_32_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 341.22ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_vo (httomolibgpu)
    remove_outlier (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/2 [00:04<?, ?block/s]
RANK 0: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 0: slice_for_cor is 99
RANK 2: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 2: slice_for_cor is 99
RANK 1: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 1: slice_for_cor is 99
RANK 3: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 3: slice_for_cor is 99
    50%|#####     | 1/2 [00:06<00:06,  6.48s/block]
RANK 0: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 0: slice_for_cor is 99
RANK 1: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 1: slice_for_cor is 99
RANK 3: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 3: slice_for_cor is 99
RANK 2: updated_params is {'ind': 99, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 2: slice_for_cor is 99
    --->The center of rotation is 1246.75
    Finished processing last block

and with 250 slices previewing:

preview:
  detector_y:
    start: 10
    stop: 260

the middle index is found to be 124 (the middle between index 0 and index 249 in the previewed subset):

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@cs05r-sc-gpu05-16 httomo (improve-paganin-memory-estimation)]$ mpirun -n 4 python -m httomo run /dls/i12/data/2022/nt33730-1/rawdata/119647.nxs tests/samples/pipeline_template_examples/pipeline_gpu1.yaml /dls/tmp/twi18192/rotation-wrapper-ind-handling/
Pipeline has been separated into 2 sections
See the full log file at: /dls/tmp/twi18192/rotation-wrapper-ind-handling/24-09-2024_10_17_13_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 196.23ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_vo (httomolibgpu)
    remove_outlier (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/2 [00:03<?, ?block/s]
RANK 1: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 1: slice_for_cor is 124
RANK 2: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 2: slice_for_cor is 124
RANK 3: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 3: slice_for_cor is 124
RANK 0: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 0: slice_for_cor is 124
RANK 1: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 1: slice_for_cor is 124
RANK 2: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 2: slice_for_cor is 124
    50%|#####     | 1/2 [00:05<00:05,  5.53s/block]
RANK 3: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 3: slice_for_cor is 124
RANK 0: updated_params is {'ind': 124, 'smin': -50, 'smax': 50, 'srad': 6.0, 'step': 0.25, 'ratio': 0.5, 'drop': 20}
RANK 0: slice_for_cor is 124
    --->The center of rotation is 1246.75
    Finished processing last block

After thinking about this a bit more, I become more skeptical that users should be able to configure a method to get the preview value.

Bear with me, it takes a bit of explanation... :sweat_smile:

If we take the distortion correction example, with or without the changes in https://github.com/DiamondLightSource/httomolibgpu/pull/161, the value that the method needs is not the "raw preview" - it's the preview value that has then been manipulated/rearranged into some other format for the method's convenience.

Ie, even without the changes in the PR, we're not passing the preview in the format that it parsed from the YAML config, like:

{
  "angles": {"start":, "stop": , "step"},
  "detector_y": {"start":, "stop": , "step"},
  "detector_x": {"start":, "stop": , "step"},
}

Instead, the method requires the preview info in the format:

{
  "starts": [],
  "stops": [],
  "steps": []
}

What this means is that, in httomo, we cannot simply:

mark the distortion correction method as needing the preview value
and then pass in the preview value into the method

because the distortion correction method needs the preview information in a different format.

So, if a user was required to configure the distortion correction method to be give the preview information in the format that the method wanted, they would be required to have some way of syntactically saying:

take the start value of every dimension's preview, put it in a list
and the same for the stop/step values of every preview dimension

This sounds incredibly tricky, I don't think we want to be inventing some kind of syntax in the pipeline file for users to be able to:

take the preview value
and manipulate the values in that data structure into a new format that a method wants

A similar problem exists even with the changes in that httomolibgpu PR; the problem becomes the question:

how do we allow users to specify in the pipeline YAMl file that the value for the shift needs to be extracted from the loader's preview, by taking only the det y and det x start values?
similarly for the distortion correction's step parameter

which again, sounds horribly complicated.

From this reasoning, I currently think that the provision of the preview value to methods that need it should be taken care of by httomo, and the user should not have to deal with it at all. The previewing information that any method needs can be inferred from the previewing config of the loader (so it feels like the user shouldn't need to explicitly configure what can already be inferred), and the manipulation/rearrangement of the preview info to fit what a specific method needs is much more easily done in python code than by inventing some syntax in YAML to accomplish the same thing.

thanks. I also dropped the idea of the special syntax in YAML. It gets indeed a bit complicated to organise that. We can simply deal with that in wrappers by watching some methods that require information from preview ? There are not too many of those now but might be more in future. Hence I'm thinking may be we should have name preview in the arguments of those methods nevertheless? In the case of the distortion correction to can be:

shift_preview
step_preview

then it will be easier for us to catch that in the wrapper and populate it with the modified preview information?

I think we can close this one as the similar issue exists #38 and we probably would need a specific wrapper to extract preview values from the loader and pass it to the method in a specific format.

DiamondLightSource / httomo

An option to pass preview to the method #455