arcadelab / deepdrr

Code for "DeepDRR: A Catalyst for Machine Learning in Fluoroscopy-guided Procedures". https://arxiv.org/abs/1803.08606
GNU General Public License v3.0
204 stars 59 forks source link

How to accelerate the Projector? #1

Closed pengchengu closed 5 years ago

pengchengu commented 5 years ago

Hi, The projector takes around 6.5 seconds running on a GTX1060 GPU, it's much slower than an ITK DRR projector, do you have any suggestions for accelerating the projector?

mathiasunberath commented 5 years ago

Hi, the most likely reason why this projector is slower than other implementation is the higher workload since it's projecting multiple materials which also requires increased memory transfer (segmentation masks, etc). I don't know how you are currently using the projector; from your statement I assume you try to use generate single images at a time. The intended use (and this is also how we benchmarked in the paper) is to generate large datasets in a single query so that memory transfer only happens once. In this case, the performance per image should be much better. In addition to the above, the code can be adapted for online generation of images and we are working on this, unfortunately, it is not yet in a commit-able state. Maybe Nico has some more ideas.

jannicozaech commented 5 years ago

Hi, as stated by Mathias, the memory transfer is time consuming and therefore all projections from one volume should be generated in a single query. Besides this, the projector needs to perform interpolation of the density and segmentation masks for multiple materials, which results in a higher runtime compared to a classical DRR projector. Do you measure 6.5 seconds only for the forward projection or for the complete pipeline?

pengchengu commented 5 years ago

Hi Mathis and Nico, Thanks for the reply, the measurement of 6.5 seconds is for the forward projection to generate single images at a time. The projector in the codes is modified from CONRAD and I'm not familiar with CONRAD. I'm wondering why you chose CONRAD instead of some common frameworks like ITK and plastimatch, does CONRAD have some extra functions that ITK is lack of?

mathiasunberath commented 5 years ago

Hi, as pointed out above the code is not currently optimized for online generation of small image batches and we have someone working on providing alternatives for this way of using it soon. We have used CONRAD extensively previously so it was an easy choice for us. You could of course incorporate other frameworks' implementations of the projection operation if you feat that this is the bottleneck; it's just a cuda kernel and related memory transform really. I will be closing this issue for now but feel free to re-open if more questions or concerns regarding this issue arise.

fedeface98 commented 8 months ago

Hi,

I have a similar problem and I would like to know if the code can be adapted for online generation of images in the end. Do you have any suggestion for that?

Thanks a lot.

benjamindkilleen commented 8 months ago

Yes, this has since been implemented and is the default behavior.

fedeface98 commented 8 months ago

Maybe I’m doing things wrong because to generate one image from one CT volume it takes around 8s, Is it normal?

This is how I'm running the code maybe it could be useful to better understand:

ct = Volume.from_nifti(ct_path) carm = MobileCArm(rotate_camera_left = False) carm.camera_intrinsics = geo.CameraIntrinsicTransform.from_sizes( sensor_size=(512, 512), pixel_size=0.390625, source_to_detector_distance=1120, ) with Projector(ct, carm=carm) as projector:

Orient and position the patient model in world space.

  ct.orient_patient(head_first=True, supine=True)
  # Move the C-arm to the desired pose.
  carm.move_to(alpha=0, beta=0,gamma= 0, degrees=True)
  carm.move_to(isocenter_in_world =   ct.center_in_world)
  carm.source_to_isocenter_vertical_distance = 820

  # Run projection
  image = projector()
benjamindkilleen commented 8 months ago

So the bottleneck here is the initialization of the projector, which is hidden in the with block. This is the step that moves the CT arrays to the GPU. You can also use projector.initialize() and projector.free() to accomplish the same thing with more flexibility.

The current pipeline is "online" in the sense that even after initialization you can still adjust the C-arm and patient positions, which will be reflected in the new image. With the same CT, you should be able to render images in a fraction of a second, depending on resolution, hardware capability, step size, etc.

benjamindkilleen commented 8 months ago

Breaking down your code:

ct = Volume.from_nifti(ct_path)
carm = MobileCArm(rotate_camera_left = False)
carm.camera_intrinsics = geo.CameraIntrinsicTransform.from_sizes(
sensor_size=(512, 512),
pixel_size=0.390625,
source_to_detector_distance=1120,
)
projector = Projector(ct, carm=carm)
projector.initialize() # Takes a few seconds
for _ in range(10000):
    # Orient and position the patient model in world space.
    ct.orient_patient(head_first=True, supine=True)

    # Move the C-arm to the desired pose.
    carm.move_to(...)

    image = projector() # Should be very fast.

    # Do something with the images.
projector.free()
fedeface98 commented 8 months ago

First of all thanks a lot for your kindness and patience.

I tested it and yes I agree that the projector is very fast. But, if I have multi volumes, how could I speed up the process? Because for each volume I should initialize a different projector

benjamindkilleen commented 8 months ago

No worries, happy to help.

Do you want to create images with multiple volumes? Or do you want to just have many images over many different volumes? If the latter, there is no way to get around the one-time cost of moving arrays to the GPU. If you know you are only interested in a portion of the volume, you can crop the volume to the region, which will speed things up significantly.

If the former, you can provide multiple volumes to the projector, and they will all be rendered in the image. I suppose a hacky workaround would be to move the volumes you don't want to be out of the FOV, but you are still limited by how many volumes you can fit on GPU memory. (In my experience, a high-res torso CT takes between 1 - 3 GB).

fedeface98 commented 8 months ago

I can explain better my problem. I want to create images with multiple volumes, so the former. But my problem is that I change the center of rotation/translation and it is a specific anatomical point (not the CT center). So, for each volume I should have a different initialization of the projector. If I directly provide the volumes to the projector, I cannot move the isocenter in the center that I want for each volume.

benjamindkilleen commented 8 months ago

Yup, you can use carm.reposition(ct.center_in_world) to move the isocenter of the C-arm to the center of the volume, regardless of previous steps.

fedeface98 commented 8 months ago

Yes but the problem is that I want to use a specific anatomical point as isocenter (so I have the coordinates for each volume) and not the center of the ct. It means that for each volume I should reinitialize the projector? There is no way to avoid this? Because is this that takes too much time.

benjamindkilleen commented 8 months ago

You can move the volumes/C-arms wherever you want after initialization. Each volume has its own "anatomical" coordinate system (usually RAS). If you have a point in RAS as a list[float] or numpy array, you can do:

carm.reposition(ct.world_from_anatomical @ geo.point(your_point_in_RAS))

to move the isocenter to that point.

fedeface98 commented 8 months ago

but the part that takes more time is the following:

projector = Projector(volume, carm=carm) projector.initialize()

I don't have to run it for each volume?

benjamindkilleen commented 8 months ago

Here's some untested code describing what I mean:

cts = [Volume.from_nifti(path) for path in nifti_paths]
carm = MobileCarm() # with your params
projector = Projector(cts, device=carm)
projector.initialize() # Takes a while, transferring all the Volumes to GPU

# Place unused CTs faraway
for ct in cts:
    ct.place_center(geo.point(999999999, 0, 0))

for ct in cts:
    ct.place_center(geo.point(0, 0, 0)) # move away from others
    ct.orient()
    carm.reposition(point_in_world)
    # orient C-arm
    image = projector()

    ct.place_center(geo.point(999999999, 0, 0)) # move back

projector.free()

Again, I want to stress that this is not the best practice. You can equivalently initialize separate projectors for each CT, unless you want them in the same image, and initialize them all once before projecting your images.

Once you transfer a CT to GPU, you can take many images of it very rapidly, but this initialization takes a few seconds per CT and is unavoidable (until non Von Neumann architectures come to the rescue).

fedeface98 commented 8 months ago

Thanks a lot now is more clear!