CERN / TIGRE

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
BSD 3-Clause "New" or "Revised" License
527 stars 180 forks source link

Inaccurate FDK reconstruction for low voxel size #544

Closed simonCtBern closed 3 weeks ago

simonCtBern commented 1 month ago

I'm trying to use tigre for reconstruction cone-beam CT images with variable scan geometries. However, I encountered unexpected results for a test data set with simulated projections of a simple circular scan trajectory.

Expected Behavior

The quality of the reconstructed volume should not depend on "linear scaling of the object / projection magnification".

Actual Behavior

Reconstruction of the simulated scan using FDK gives good results for the following set of geometry parameters:

However, reconstruction of the same data provides poor results if a larger magfinication is provided as geometry parameters:

All other parameters (number of pixels, pixel size) and the projections are identical for both reconstructions:

I tried changing the accuracy parameter of the geometry over a wide range of several orders of magnitudes without visible effect.

It seems that the is some "absolute numerical accuracy threshold" which creates problems at low voxel size? Is there a way around this? (Typing a wrong magnification as a workaround may not matter for an ideal circular scan trajectory, but my goal is to test variations/deviations in the geometry for all projections individually, which would be influenced by the magnification.)

Scan setup

The test data for the reconstruction is a simulated circular cone-beam scan of a hexagonal object with a hole in the center (roughly representing a hexagonal nut).

result with large DSO (low magnification):

tigre_large_SRD

result with low DSO (high magnification), "correct" simulation setup with bad reconstruction:

tigre_nominal_SRD

Specifications

AnderBiguri commented 1 month ago

Hi @simonCtBern Thanks for using TIGRE!

I need to be honest here: I understood almost nothing of what you are trying to experiment with here, nor the goals or how you changed the code. I also don't understand what I should be looking at in the images and what you expect to see. Can you share a little bit of context here, and some code?

My current guess is that you seem to be playing with ratios of pixels sizes in the detector vs voxel sizes in the image perhaps?

AnderBiguri commented 1 month ago

Just in case my guess was right:

You generally want to maitain a ratio of geo.dDetector=geo.dVoxel*geo.DSD/geo.DSO. This is ideally the one you want to accurately have image space for all the info in the detector, without wasting compute time. The way the code is made, particularly the backprojection, will suffer a little bit numerically at very large deviations from this ratio. If you want to know why I can easily point you to the corresponding resources that explain it (mostly my thesis).

simonCtBern commented 1 month ago

Hi @AnderBiguri Thanks for the quick reply!

I'm afraid I made my question sound more complicated than it is.

In a standard circular cone-beam CT, situations 1 and 2 in the following figure produce the same projections and the reconstructed volume for both situations must look identical (just with different voxel size geo.dVoxel), given that the size of the blue object is scaled according to its distance from the source (DSO 1 and DSO 2) and the detector remains unchanged: CT_magnification

However, it seems that using TIGRE, the reconstructed volume does not look the same, but there are numerical artifacts in situation 2 (second image in my initial post). We normally use a commercial software for reconstruction which does not produce these artifacts in situation 2.

I didn't share any code because I'm only using algs.fdk() without modifications and there is no error.

Is it possible that there is some form of numerical threshold in the FDK backprojection which produces bad results for small voxel size geo.dVoxel? Without knowing the details, I believe the algorithm calculates an intersection between x-rays and voxels and interpolates their projection on the detector..?

AnderBiguri commented 1 month ago

@simonCtBern Thanks, this makes more sense!

Are the images real scans, or simulated with TIGRE? For real scans this would not necesarily be true, of course, due to focus, and other real X-ray measurements.

Indeed, as you well point it out, this should not happen. Would you be able to share code/data to reproduce this myself? Ideally a cropped out version (as close to 2D as possible without removing the effects) so I can run it in a local computer, as my direct access to big computers for recon is a little bit limited these days.

Another test to run is to try to do Atb() directly on the sinos, rather than fdk. More to see if this is a numerical inaccuracy in the projectors or in the filtering. In any case, definetly the code to reproduce it would help me a lot pinpoint the issue and see what is off.

simonCtBern commented 1 month ago

@AnderBiguri The screenshots were from projections simulated with aRTist, but it happened for real measured data as well (using simulated data was my problem searching approach).

I haven't tried running Atb() directly on the projections, but it would surprise me if it made a difference, since the (filtered or unfiltered) projections in my original datasets (i.e. not from TIGRE) don't depend on geo.DSO..?

Below, you can find code using TIGRE only for simulating and reconstructing a cube in almost 2D. On my system, the artifacts appear or not depending on the choice of geo.DSO:

geo.DSO = 100 tigre_DSO_100

geo.DSO = 6 tigre_DSO_6

geo.DSO = 1 tigre_DSO_1

#% testing reconstruction accuracy for small voxel sizes

import tigre
import tigre.algorithms as algs
import numpy as np
import matplotlib.pyplot as plt

#%% define projection parameters
detector_width = 2000
detector_height = 4
nb_projections = 1500
pixel_size = 0.2

# define the projection angles
angles = np.linspace(0, -2*np.pi, nb_projections, False).astype('float32')

# create a geomtry object
geo = tigre.geometry()

#%% Distance Source Object
geo.DSO = 1.0
# geo.DSO = 6.0
# geo.DSO = 100.0

#%% Distance Source Detector
geo.DSD = 600.0

# Detector parameters
geo.nDetector = np.array([detector_height, detector_width])  # number of pixels              (px)
geo.dDetector = np.array([pixel_size, pixel_size])  # size of each pixel            (mm)
geo.sDetector = geo.nDetector * geo.dDetector  # total size of the detector    (mm)

# Image parameters
geo.nVoxel = np.array([detector_height, detector_width, detector_width])  # number of voxels              (vx)
geo.sVoxel = geo.nVoxel * pixel_size * geo.DSO / geo.DSD  # total size of the image       (mm)
geo.dVoxel = geo.sVoxel / geo.nVoxel  # size of each voxel            (mm)

# Offsets
geo.offOrigin = np.array([0, 0, 0])  # Offset of image from origin   (mm)
geo.offDetector = np.array([0, 0])  # Offset of Detector            (mm)

# Auxiliary
geo.accuracy = 0.1  # Variable to define accuracy of 'interpolated' projection. 
# It defines the amount of samples per voxel. Recommended <=0.5             (vx/sample)

geo.rotDetector = np.array([0, 0, 0])  # Rotation of the detector, by
# X,Y and Z axis respectively. (rad)
# This can also be defined per angle

geo.mode = "cone"  # Or 'parallel'. Geometry type.

#%% object definition
# create a volume with a cube in its center, filling 1/2 of each dimension
cube = np.zeros(
    (detector_height, detector_width, detector_width), dtype='float32')
cube[(detector_height // 4):(detector_height * 3 // 4),
     (detector_width // 4):(detector_width * 3 // 4),
     (detector_width // 4):(detector_width * 3 // 4)] = 1

#%% Simulate forward projection.
projections = tigre.Ax(cube, geo, angles)

#%% FDK
volFDK = algs.fdk(projections, geo, angles, filter="shepp_logan")

#%% plot a slice
cmap = plt.cm.gray
volSlice = volFDK[volFDK.shape[0] // 2, :, :].copy()
norm = plt.Normalize(vmin=volSlice.min(), vmax=volSlice.max())
image = cmap(norm(volSlice))
plt.imshow(image)
AnderBiguri commented 1 month ago

Looking into this...

just FYI: geo.accuracy is a red herring here, its only used for one type of projector (interpolated) but the default is ray-voxel (Siddon), so its not being used, and it would also not have an influence, I think.

AnderBiguri commented 1 month ago

Hum, this seems to be related to numerical precission in the backprojection indeed (not filtering).

I thought it was filtering, because indeed I see no issue when I replace algs.fkd by tigre.Atb. However, if I reconstruct using an algorithm that essentially only requires forward/backprojection, e.g. SIRT, I do see the effect too.

It is plausible that it is related to #355, which has a relatively easy fix, so I will try it. It could also be that TIGRE uses floats and not doubles, as you are pushing the magnification by a lot (in real life it would be imposible to have a source so good that 600 mag would not be just a blur). Let me try to see if #355 fixes it....

AnderBiguri commented 1 month ago

In particular, this seems to be the solution: Change the corresponding TIGRE function in the same location by the following version: https://github.com/IbexInnovations/TIGRE/blob/8d8c8eb4e7be922a69512baa08b4dad5bf516cc3/Common/CUDA/Siddon_projection.cu#L680

I don't' have access to PCs to recompile TIGRE until end of the week, if you want to give this a shot before, just replace the function by the linked one.

simonCtBern commented 1 month ago

Yes, we can't do scans with 600 mag :) but with 100, and there are artifacts already, so it seems to have some relevance for real life scans.

I tried re-installing TIGRE with the changed function in the source files, but it didn't solve the problem. (I hope it recompiled properly for re-installation, I did pip install . as in the installation instructions and the verbose output of pip said "Successfully built pytigre", "Successfully uninstalled pytigre-2.4.0", "Successfully installed pytigre-2.4.0", so it looked promising).

It would surprise me if it was a limitation of 32-bit float, since it's still floating-point numbers and i'd expect the magnification to be "taken care of by the exponent" and not limited by the precision of the fraction of the floating-point representation, but that's just a guess. The commercial software we use that doen't produce the artifacts also stores at least the reconstruction volume with 32-bit float (we can tell by the total memory usage), but then again the ray-voxel-intersection might be implemented differently...

AnderBiguri commented 1 month ago

@simonCtBern thanks for checking! Indeed, preccissionwise I was talking about the ray-tracing precision, not image values, those should be fine.

This will certainly need some digging in the CUDA code. I think it has some high priority for me to fix it, but its quite a hard problem to tackle, and I am quite busy in the next 4 weeks, just to let you know. If yo feel brave enough to dig on the projection codes (Siddon_projection.cu and voxel_backprojection.cu), feel free to tackle it, but these are quite non-trivial files to be playing with.

Will let you know as soon as I can fix this.

simonCtBern commented 1 month ago

@AnderBiguri I looked at the .cu files you mentioned and I think I was able to fix it. :)

It seems that the problem was indeed float precision, but in only in the function computeDeltasCube() in voxel_backprojection.cu.

now the result looks identical for a wide range of source-object-distances: geo.DSO = 100 fixed_dso_100

geo.DSO = 1 fixed_dso_1

What I did was to add a struct Point3Ddouble in types_TIGRE.hpp:

struct Point3Ddouble{
    double x;
    double y;
    double z;

    // cast to float member function
    Point3D to_float()
    {
        Point3D castToFloat;
        castToFloat.x = (float)x;
        castToFloat.y = (float)y;
        castToFloat.z = (float)z;
        return(castToFloat);
    }
};

And in voxel_backprojection.cu, I added double versions of the functions eulerZYZT() and rollPitchYawT() and modified computeDeltasCube() to run "all" operations as double but convert the results to float so the higher-level structure doesn't need to be changed:

void eulerZYZT_double(Geometry geo, Point3Ddouble* point){

    Point3Ddouble auxPoint;
    auxPoint.x=point->x;
    auxPoint.y=point->y;
    auxPoint.z=point->z;

    double sin_alpha, cos_alpha, sin_theta, cos_theta, sin_psi, cos_psi;
    sin_alpha = sin((double)geo.alpha);
    cos_alpha = cos((double)geo.alpha);
    sin_theta = sin((double)geo.theta);
    cos_theta = cos((double)geo.theta);
    sin_psi = sin((double)geo.psi);
    cos_psi = cos((double)geo.psi);

    point->x = auxPoint.x*(cos_psi*cos_theta*cos_alpha-sin_psi*sin_alpha)
    +auxPoint.y*(-cos_psi*cos_theta*sin_alpha-sin_psi*cos_alpha)
    +auxPoint.z*cos_psi*sin_theta;
    point->y = auxPoint.x*(sin_psi*cos_theta*cos_alpha+cos_psi*sin_alpha)
    +auxPoint.y*(-sin_psi*cos_theta*sin_alpha+cos_psi*cos_alpha)
    +auxPoint.z*sin_psi*sin_theta;
    point->z =-auxPoint.x*sin_theta*cos_alpha
    +auxPoint.y*sin_theta*sin_alpha
    +auxPoint.z*cos_theta;
}

void rollPitchYawT_double(Geometry geo,int i, Point3Ddouble* point){
    Point3Ddouble auxPoint;
    auxPoint.x=point->x;
    auxPoint.y=point->y;
    auxPoint.z=point->z;

    double sin_dRoll, cos_dRoll, sin_dPitch, cos_dPitch, sin_dYaw, cos_dYaw;
    sin_dRoll = sin((double)geo.dRoll[i]);
    cos_dRoll = cos((double)geo.dRoll[i]);
    sin_dPitch = sin((double)geo.dPitch[i]);
    cos_dPitch = cos((double)geo.dPitch[i]);
    sin_dYaw = sin((double)geo.dYaw[i]);
    cos_dYaw = cos((double)geo.dYaw[i]);

    point->x=cos_dRoll*cos_dPitch*auxPoint.x
            +sin_dRoll*cos_dPitch*auxPoint.y
            -sin_dPitch*auxPoint.z;

    point->y=(cos_dRoll*sin_dPitch*sin_dYaw - sin_dRoll*cos_dYaw)*auxPoint.x
            +(sin_dRoll*sin_dPitch*sin_dYaw + cos_dRoll*cos_dYaw)*auxPoint.y
            +cos_dPitch*sin_dYaw*auxPoint.z;

    point->z=(cos_dRoll*sin_dPitch*cos_dYaw + sin_dRoll*sin_dYaw)*auxPoint.x
            +(sin_dRoll*sin_dPitch*cos_dYaw - cos_dRoll*sin_dYaw)*auxPoint.y
            +cos_dPitch*cos_dYaw*auxPoint.z;
}

void computeDeltasCube(Geometry geo,int i, Point3D* xyzorigin, Point3D* deltaX, Point3D* deltaY, Point3D* deltaZ,Point3D* S)
{
    Point3Ddouble P, Px,Py,Pz;
    // Get coords of Img(0,0,0)
    P.x=-(geo.sVoxelX/2-geo.dVoxelX/2)+geo.offOrigX[i];
    P.y=-(geo.sVoxelY/2-geo.dVoxelY/2)+geo.offOrigY[i];
    P.z=-(geo.sVoxelZ/2-geo.dVoxelZ/2)+geo.offOrigZ[i];

    // Get coors from next voxel in each direction
    Px.x=P.x+geo.dVoxelX;      Py.x=P.x;                Pz.x=P.x;
    Px.y=P.y;                   Py.y=P.y+geo.dVoxelY;    Pz.y=P.y;
    Px.z=P.z;                   Py.z=P.z;                Pz.z=P.z+geo.dVoxelZ;

// Rotate image around X axis (this is equivalent of rotating the source and detector) RZ RY RZ

    eulerZYZT_double(geo,&P);
    eulerZYZT_double(geo,&Px);
    eulerZYZT_double(geo,&Py);
    eulerZYZT_double(geo,&Pz);

    //detector offset
    P.z =P.z-geo.offDetecV[i];            P.y =P.y-geo.offDetecU[i];
    Px.z =Px.z-geo.offDetecV[i];          Px.y =Px.y-geo.offDetecU[i];
    Py.z =Py.z-geo.offDetecV[i];          Py.y =Py.y-geo.offDetecU[i];
    Pz.z =Pz.z-geo.offDetecV[i];          Pz.y =Pz.y-geo.offDetecU[i];

    //Detector Roll pitch Yaw
    //
    //
    // first, we need to offset everything so (0,0,0) is the center of the detector
    // Only X is required for that
    P.x=P.x+(geo.DSD[i]-geo.DSO[i]);
    Px.x=Px.x+(geo.DSD[i]-geo.DSO[i]);
    Py.x=Py.x+(geo.DSD[i]-geo.DSO[i]);
    Pz.x=Pz.x+(geo.DSD[i]-geo.DSO[i]);
    rollPitchYawT_double(geo,i,&P);
    rollPitchYawT_double(geo,i,&Px);
    rollPitchYawT_double(geo,i,&Py);
    rollPitchYawT_double(geo,i,&Pz);

    P.x=P.x-(geo.DSD[i]-geo.DSO[i]);
    Px.x=Px.x-(geo.DSD[i]-geo.DSO[i]);
    Py.x=Py.x-(geo.DSD[i]-geo.DSO[i]);
    Pz.x=Pz.x-(geo.DSD[i]-geo.DSO[i]);
    //Done for P, now source
    Point3Ddouble source;
    source.x=geo.DSD[i]; //already offseted for rotation
    source.y=-geo.offDetecU[i];
    source.z=-geo.offDetecV[i];
    rollPitchYawT_double(geo,i,&source);

    source.x=source.x-(geo.DSD[i]-geo.DSO[i]);//   source.y=source.y-auxOff.y;    source.z=source.z-auxOff.z;

//       mexPrintf("%f,%f,%f\n",source.x,source.y,source.z);
    // Scale coords so detector pixels are 1x1

    P.z =P.z /geo.dDetecV;                          P.y =P.y/geo.dDetecU;
    Px.z=Px.z/geo.dDetecV;                          Px.y=Px.y/geo.dDetecU;
    Py.z=Py.z/geo.dDetecV;                          Py.y=Py.y/geo.dDetecU;
    Pz.z=Pz.z/geo.dDetecV;                          Pz.y=Pz.y/geo.dDetecU;

    source.z=source.z/geo.dDetecV;                  source.y=source.y/geo.dDetecU;

    // get deltas of the changes in voxels
    deltaX->x=Px.x-P.x;   deltaX->y=Px.y-P.y;    deltaX->z=Px.z-P.z;
    deltaY->x=Py.x-P.x;   deltaY->y=Py.y-P.y;    deltaY->z=Py.z-P.z;
    deltaZ->x=Pz.x-P.x;   deltaZ->y=Pz.y-P.y;    deltaZ->z=Pz.z-P.z;

    *xyzorigin=P.to_float();
    *S=source.to_float();
}

Also, I used the version of computeDeltas_Siddon() in Siddon_projection.cu that you mentioned in your previous answer. https://github.com/IbexInnovations/TIGRE/blob/8d8c8eb4e7be922a69512baa08b4dad5bf516cc3/Common/CUDA/Siddon_projection.cu#L680

This worked fine after re-installing TIGRE with it.

AnderBiguri commented 1 month ago

@simonCtBern fantastic news!

Would you want to make a PR to TIGRE with the changes, to fix it?

simonCtBern commented 1 month ago

Yes I'll do that (probably tomorrow).

simonCtBern commented 3 weeks ago

I have another question concerning memory usage, which is not a bug but might of interest to others, so I'll simply post it here:

We work with 4k data volumes, so the whole voxel volume is 128 GB as uint16 and 256 GB as float32. For a full scan, we typically record 3000 projections which are 96 GB / 192 GB.

Since TIGRE requires to first load all projections as float32 and then allocates the voxel volume, we need 256 + 192 = 448 GB of RAM (minus some border cropping in reality).

We have 2 workstations, one with 1 TB of RAM and one with 512 GB of RAM. The 4k reconstruction works with TIGRE on the 1 TB computer and uses around 440 GB of ram (screenshot below). So I'd expect it to work on the 512 GB as well. But it doesn't and crashes with the following message without reaching 100% memory usage: error_memory

One difference between the two computers is in the graphics card setup: 1 TB computer: 1 graphics card, 24 GB dedicated memory, 512 GB shared memory 512 GB computer: 2 graphics cards, 2 x 12 GB dedicated memory, 256 GB shared memory

To be honest, I don't know how shared GPU memory really is used or allocated, but i noticed that TIGRE uses > 400 GB shared GPU memory on the 1 TB computer according to the task manager, which would exceed the capabilities on the 512 GB machine. (The commercial software we also use can reconstruct 4k on the 512 GB computer and doesn't show shared memory usage according to the task manager, whatever that means.)

Does TIGRE allocate the voxel volume explicitely as shared GPU memory? Or do you have another idea why it crashes?

Running 4K reconstruction on the 1 TB computer: screenshot_memoryUsage_4k png

AnderBiguri commented 3 weeks ago

Hum, interesting.

In theory, TIGRE allocates "pinned" memory in your CPU for the image, and then slowly fills that up with as much as it can do in bulk in 1 GPU. The out of memory is happening at allocating this volume and pinning it. I am not aware that pinned memory has any other requirement, but for sanity check, could you recompile TIGRE without pinned memory (pip install . --no_pinned_memory) and trying again?

What is Shared GPU memory anyway? do you have it specially allocated for the GPU? TIGRE just asks for normal RAM, I think there is no difference, but just in case.

simonCtBern commented 3 weeks ago

I have no idea what shared GPU memory really is, maybe even just a Windows thing...

However, I have trouble recompiling TIGRE without pinned memory. pip install . --no_pinned_memory gives me an error no such option: --no_pinned_memory. Are you sure about the syntax..?

AnderBiguri commented 3 weeks ago

@simonCtBern ugh, sorry, we recently change the install and I am not sure how this now parses. Note that the memory is allocated in large chunks, so the fact that you are not seeing all the 512 being allocated doesn't necesarily mean that its not being used, as the code will ask e.g. 20 GB at a time (depends on the system) and thus may error when the OS says "you can't allocate this much".

However, as you say, in the other machine it doesn't seem to use that much. I can't really say what is going on really, I have definetly used 500GBs on a 512GB RAM machine before, so I know it works sometimes...

However, I think I say a little mistake in your calculations. I believe (would need to double check) that the current TIGRE makes a copy of the sinograms, for the filtered version of them. I.e. we don't have a way to FFT them in place. so you need 256 + 192*2=640, I think.

simonCtBern commented 3 weeks ago

Yes, FDK() from single_pass_algorithms.py creates a copy of the filtered projections, but I created a customized version which filteres without creating a copy calls Atb() directly to avoid this.

From what I've read, pip also deprecated some options so I'll see if I can figure out how to parse the command. I'll let you know if I make some progress.

AnderBiguri commented 3 weeks ago

@simonCtBern ah I see.

That is a self-flag though, from our setup.py, not a pip flag

simonCtBern commented 3 weeks ago

@AnderBiguri I couldn't get pip to pass the flag to setup.py, so i just set no_pinned=True in my local copy of setup.py for recompiling. And it works!

Memory usage gets to a maximum of around 450 GB as it would be expected. (And the task manager doesn't show usage of this shared memory thing anymore.)

Screenshot after the reconstruction finished (with post-processing causing changes in memory usage at the end not related to TIGRE): hp_4k_ram_finished

AnderBiguri commented 3 weeks ago

@simonCtBern Great! So indeed it is related to how much memory the OS allows you to "pin". No much I can do about that from the TIGRE side. TIGRE will be slower without that pinned memory, but at least it works :)