Closed yousefmoazzam closed 3 months ago
This has been addressed in #393 . Specifically, I've been tracing all individual allocations and digged into the astra toolbox as well. The estimator code has been adjusted to reflect the original code structure better. The full details are as follows, for the test case with the following input data size: (1801, 5, 2560)
, i.e. 5 slices in sinogram view. The recorded allocations are as follows:
filtersinc
function, these are the allocations that happen:
swapaxis
which allocates input_size
again: 92,211,200cudaArray
of the same size as the input for texture access. This is an allocation of around the input size (give or take a few bytes), and we assume that (could not be tracked with memory hook)1200 x 1200 int64
array, which we'll use for the estimates as a fixed cost independent of slices28,800,000
, i.e. 1200 x 1200 x 5 x float32
As part of #361, the FBP memory estimation was improved to more accurately represent what memory allocations the FBP method does. However, there is still some memory being allocated that is not yet known where in the method it is happening.
For now, a multiplier of 2 of the output of the 1D RFFT has been added to bump the memory estimation: https://github.com/DiamondLightSource/httomo/blob/0b5f020175b23388da9bea5940b1e246c6e80be0/httomo/methods_database/packages/external/httomolibgpu/supporting_funcs/recon/algorithm.py#L103-L112
and this allows 80GB data to be put through the method in httomo without issue.
However, from observations of using cupy's
LineProfileHook
is was determined that it's most likely not the case that the 1D RFFT is creating more than one array. Therefore, more investigation is needed to determine what is causing the extra memory allocated.