Computing Hessian products with image data objects and related stuff

evgueni-ovtchinnikov commented 1 month ago

Changes in this pull request

Testing performed

Related issues

Fixes #1244, #1249

Checklist before requesting a review

[ ] I have performed a self-review of my code
[ ] I have added docstrings/doxygen in line with the guidance in the developer guide
[ ] I have implemented unit tests that cover any new or modified functionality
[x] The code builds and runs on my machine
[ ] CHANGES.md has been updated with any functionality change

Contribution Notes

Please read and adhere to the contribution guidelines.

Please tick the following:

[x] The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in SIRF (the Work) under the terms and conditions of the Apache-2.0 License.

KrisThielemans commented 1 month ago

Thanks, this already looks usable (with a caveat for the "accumulation"). Is this in a state that @gschramm can already try?

evgueni-ovtchinnikov commented 1 month ago

We still have #1251, which is due to the fact that PoissonLogLikelihoodWithLinearModelForMeanAndListModeDataWithProjMatrixByBin does not override empty ObjectiveFunction.get_subset_sensitivity(). I am afraid I cannot be of much help here.

gschramm commented 1 month ago

@KrisThielemans @evgueni-ovtchinnikov Just implemented a short test script for the PoissonLogL .accumulate_Hessian_times_input() here.

current_estimate = ref_recon.copy()
input_img = acq_data.create_uniform_image(value=1, xy=nxny)
np.random.seed(0)
input_img.fill(np.random.rand(*input_img.shape) * (obj_fun.get_subset_sensitivity(0).as_array() > 0) * current_estimate.max())

hess_out_img = acq_data.create_uniform_image(value=1.0, xy=nxny)

obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0, out=hess_out_img)
hess_out_img2 = obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)

fig, ax = plt.subplots(1, 4, figsize=(16, 4), tight_layout=True)
ax[0].imshow(current_estimate.as_array()[71, :, :], cmap = 'Greys')
ax[1].imshow(input_img.as_array()[71, :, :], cmap = 'Greys')
ax[2].imshow(hess_out_img.as_array()[71, :, :], cmap = 'Greys')
ax[3].imshow(hess_out_img2.as_array()[71, :, :], cmap = 'Greys')
ax[0].set_title('current estimate', fontsize = 'medium')
ax[1].set_title('input', fontsize = 'medium')
ax[2].set_title('hess_out_img', fontsize = 'medium')
ax[3].set_title('hess_out_img2', fontsize = 'medium')
fig.show()

Unfortunately, the output of both calls (without and with using the out kwarg), don't seem to work. hess_out_img is still an image full of ones and hess_out_img2 seems to be the same as input_img.

Figure_1

These results used: STIR commit feb6d85eadb392f5b8278d3b97ae2ee67ca439d9 (HEAD, origin/master, origin/HEAD, master) Author: Kris Thielemans k.thielemans@ucl.ac.uk Date: Thu May 9 23:39:17 2024 +0100 SIRF commit 66c35c766694fe7258406b8e85177bfc0e7c9b24 (HEAD, origin/acc-hess) Author: Evgueni Ovtchinnikov evgueni.ovtchinnikov@stfc.ac.uk Date: Sat May 11 07:35:35 2024 +0000

KrisThielemans commented 1 month ago

This is a test with protection data, correct?

gschramm commented 1 month ago

Indeed. The obj_fun is

obj_fun = sirf.STIR.make_Poisson_loglikelihood(acq_data)

KrisThielemans commented 1 month ago

@gschramm could you try again?

gschramm commented 1 month ago

Looks much better now. Tmr morning I will test whether we images are what we expect from the formulas.

KrisThielemans commented 1 month ago

Could work for listmode as well

gschramm commented 1 month ago

@KrisThielemans I just compared against manually computing the "Hessian multiply" calling the fwd / back projections in SIRF. The result is very close in the foreground (where the current estimate is >> 0), but not super close in the background (where the current estimate is close to 0). Any ideas why that is?

hess_out_img = obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)

# %%
# calcuate the the Hessian multiply "manually"

acq_model.set_up(acq_data, initial_image)
acq_model.num_subsets = num_subsets
acq_model.subset_num = 0

# get the linear (Ax) part of the Ax + b affine acq. model
lin_acq_model = acq_model.get_linear_acquisition_model()
lin_acq_model.num_subsets = num_subsets
lin_acq_model.subset_num = 0

# for the Hessian "multiply" we need the linear part of the acquisition model applied to the input image
input_img_fwd = lin_acq_model.forward(input_img)
current_estimate_fwd = acq_model.forward(current_estimate)
h = -acq_model.backward(acq_data*input_img_fwd / (current_estimate_fwd*current_estimate_fwd))

The Hessian outputs are shown using a "narrow" color window (top) and a wide color window (bottom) to show the differences in foreground and background.

KrisThielemans commented 1 month ago

hmmm. We try to cancel singularities in the division https://github.com/UCL/STIR/blob/feb6d85eadb392f5b8278d3b97ae2ee67ca439d9/src/recon_buildblock/PoissonLogLikelihoodWithLinearModelForMeanAndProjData.cxx#L1128 with STIR's divide_and_truncate function. However, possibly that goes a bit wrong, as it was designed for (measured / estimated), while here it is used for (fwd * measured / estimated^2), so maybe thresholds are a bit off.

Of course, ideally there wouldn't be any singularities. Is your background strictly positive?

gschramm commented 1 month ago

The background term has indeed a few 0s (Siemens mMR data which I guess has virtual 0-sens. LORs).
In the meantime I also tested the listmode objective hessian. (i) currently, it returns the negative of what we want (output is positive, but should be negative). (ii) The "listmode hessian" is very close to my "manual sinogram" hessian (see below - I plotted the "minus the LM hessian output").

KrisThielemans commented 1 month ago

that's encouraging.

can you try your manual with some handling of 0/0 or so? A bit surprising it didn't generate NaNs then I guess. Easiest is to add an eps to the denominator.

I'll flip the sign of the LM Hessian. 😊

gschramm commented 1 month ago

adding an eps = 1e-8 in the denominator of the ratio to be back projected in the Hessian does not change much - still much closer to the LM Hessian applied.

KrisThielemans commented 1 month ago

ah well, this looks like a STIR issue then for the Hessian with projection data. Best to open an issue there.

KrisThielemans commented 1 month ago

@evgueni-ovtchinnikov can you then add multiply_with_Hessian? (Can just as well leave the accumulate version in I guess)

KrisThielemans commented 1 month ago

@gschramm what computation times are you getting?

evgueni-ovtchinnikov commented 1 month ago

multiply_with_Hessian: @KrisThielemans which version/branch of STIR should I use? I have rel_6.0.0 on my latest VMs.

gschramm commented 1 month ago

@gschramm what computation times are you getting?

Using 21 subsets, mMR LM file with 4,332,816 counts and acq_model = sirf.STIR.AcquisitionModelUsingRayTracingMatrix() and 1 tangential LOR:

In [5]: %timeit hess_out_img = obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
2.7 s ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit hess_out_img_lm = lm_obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
9.43 s ± 249 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

KrisThielemans commented 1 month ago

multiply_with_Hessian: @KrisThielemans which version/branch of STIR should I use? I have rel_6.0.0 on my latest VMs.

you can use 6.0.0. Just add an extra member (that calls accumulate) and remove the fill in `accumulate.

KrisThielemans commented 1 month ago

Thanks @gschramm would you mind adding the timings in the STIR PR? Ideally also add gradient timings. I'd be interested as well in seeing what it does if num_subsets decreases (i.e. is the listmode calculations slowing us down, or the reading). Of course, you could also enable the lm_obj_fun caching, but this is going beyond what you care about. Thanks a lot for all your help.

gschramm commented 1 month ago

(1) Will add all timings to the PR in a second. Does it make sense to test with sirf.STIR.AcquisitionModelUsingParallelproj()? I remember you mentioned that this is not supported yet for LM? (2) Is is possible to save the computed sensitivity images to disk to speed up the run time if users want to execute the script multiple times?

KrisThielemans commented 1 month ago

1) No parallelproj yet. There's a difficulty that with the proj-matrix, I re-use the matrix-row for all calculations. That won't be possible with parallproj. So, we'll need new functions. Not sure if I can still do that.

2) In STIR, we can, but it looks like this isn't enabled yet in SIRF. @evgueni-ovtchinnikov do we expose PoissonLogLikelihoodWithLinearModelForMean::set_subset_sensitivity_filenames? It should call

  //! set filename pattern to read (or write) the subset sensitivities
  /*! set to a zero-length string to avoid reading/writing a file
  Could be e.g. "subsens_%d.hv"
  boost::format is used with the pattern (which means you can use it like sprintf)

  Calls error() if the pattern is invalid.
 */
  void set_subsensitivity_filenames(const std::string&);

I see https://github.com/SyneRBI/SIRF/blob/1d53570cff96acb33734ac1da15b94bc43373d92/src/xSTIR/cSTIR/cstir_p.cpp#L933-L934 but https://github.com/SyneRBI/SIRF/blob/1d53570cff96acb33734ac1da15b94bc43373d92/src/xSTIR/cSTIR/cstir_p.cpp#L933 In any case, we need the subset version ATM.

If we have this, the usage is a bit awkward: first run as normal. then use set_recompute_sensitivity(false), in which case it will go and find them.

gschramm commented 1 month ago

@KrisThielemans Let's leave LM acq.model with Parallelproj for later

gschramm commented 1 month ago

Timing results of Sino/LM gradients/hessian multiplies for different number of subsets

mMR LM file with 4,332,816 counts and acq_model = sirf.STIR.AcquisitionModelUsingRayTracingMatrix() and 1 tangential LOR, sirf.STIR.AcquisitionData.set_storage_scheme("memory")

7 subsets

In [14]: %timeit g = obj_fun.gradient(initial_image, subset = 0)
4.19 s ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: %timeit hess_out_img = obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
4.35 s ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [18]: %timeit g = lm_obj_fun.gradient(initial_image, subset=0)
4.18 s ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [19]: %timeit hess_out_img_lm = lm_obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
4.13 s ± 46.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

14 subsets

In [22]: %timeit g = obj_fun.gradient(initial_image, subset = 0)
2.75 s ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [23]: %timeit hess_out_img = obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
3.07 s ± 39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [26]: %timeit g = lm_obj_fun.gradient(initial_image, subset=0)
4.67 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [27]: %timeit hess_out_img_lm = lm_obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
4.89 s ± 68.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

21 subsets

In [32]: %timeit g = obj_fun.gradient(initial_image, subset = 0)
2.35 s ± 37.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [33]: %timeit hess_out_img = obj_fun.accumulate_Hessian_times_input(current_estimate, input_img, subset=0)
2.69 s ± 58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [34]: %timeit g = lm_obj_fun.gradient(initial_image, subset=0)
6.35 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [35]: %timeit g = lm_obj_fun.gradient(initial_image, subset=0)
6.14 s ± 53.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

CPU info

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0-255
Thread(s) per core:              2
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7662 64-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1499.555
CPU max MHz:                     2000.0000
CPU min MHz:                     1500.0000
BogoMIPS:                        4000.00
Virtualization:                  AMD-V
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        512 MiB
NUMA node0 CPU(s):               0-63,128-191
NUMA node1 CPU(s):               64-127,192-255

gschramm commented 1 month ago

@KrisThielemans @evgueni-ovtchinnikov is accumulate_Hessian_times_input the final name? (I remember discussions on the naming) If possible, I'd like to finish my notebooks today.

KrisThielemans commented 1 month ago

@evgueni-ovtchinnikov is working on implementing https://github.com/SyneRBI/SIRF/issues/1244#issuecomment-2103346336. Hopefully this will be done soon. @evgueni-ovtchinnikov if you have trouble, please ask for help.

KrisThielemans commented 1 month ago

@evgueni-ovtchinnikov here's the STIR test I talked about. https://github.com/UCL/STIR/blob/90cff236be35628f447ded4d98a046d5c4e3b316/src/include/stir/recon_buildblock/test/ObjectiveFunctionTests.h#L169 Best to implement the name first though.

KrisThielemans commented 1 month ago

@evgueni-ovtchinnikov you seem to be pretty close. Can you summarise current situation?

evgueni-ovtchinnikov commented 1 month ago

@KrisThielemans @gschramm I believe this PR can be merged - any objections?

evgueni-ovtchinnikov commented 1 month ago

builds failed at Coveralls step - any idea anyone why all of a sudden?

KrisThielemans commented 1 month ago

Also, please comment out the Coveralls lines https://github.com/SyneRBI/SIRF/blob/d54c4550d1533132c6a281b6cf2159c8fd22f45c/.github/workflows/build-test.yml#L133C4-L147C32 You probably have to preserve indentation. Or leave that for a separate PR

KrisThielemans commented 1 month ago

@evgueni-ovtchinnikov test_Hessian isn't a great test, as it just compares the norms, while it should compare the difference.

I'm modifying it and putting it in test_ObjectiveFunction.py

  def test_Hessian(self, subset=-1, eps=1e-3):
        """Checks that grad(x + dx) - grad(x) is close to H(x)*dx
        """
        x = self.image
        dx = x.clone()
        dx *= eps/dx.norm()
        dx += eps/2
        y = x + dx
        gx = self.obj_fun.gradient(x, subset)
        gy = self.obj_fun.gradient(y, subset)
        dg = gy - gx
        Hdx = self.obj_fun.multiply_with_Hessian(x, dx, subset)
        norm = dg.norm()
        q = (dg - Hdx).norm()/dg.norm()
        print('norm of grad(x + dx) - grad(x): %f' % dg.norm())
        print('norm of H(x)*dx: %f' % Hdx.norm())
        print('relative difference: %f' % q)
        numpy.testing.assert_array_equal(q,0)

However, I had to add a constant term to the acq_model as otherwise the test fails (essentially because the Hessian is ill-defined otherwise). I'll commit this soon.

KrisThielemans commented 1 month ago

@evgueni-ovtchinnikov I'm now adding a similar test to the priors. (Had to delete a subset in the call). However, I'm confused by results. Does get_gradient(x) modify the result? Looks like it for priors.

evgueni-ovtchinnikov commented 1 month ago

@KrisThielemans

Does get_gradient(x) modify the result? Looks like it for priors.

what result?

KrisThielemans commented 1 month ago

got distracted... Sorry,

I did push the test for the LogLikelihood. Works for me.

KrisThielemans commented 1 month ago

@KrisThielemans

Does get_gradient(x) modify the result? Looks like it for priors.

what result?

ignore that. The loop in the test is a bit hard to understand... Update coming soon.

KrisThielemans commented 1 month ago

ok. Should be all done now. Can you add a line to CHANGES.md and merge?

SyneRBI / SIRF