CERN / TIGRE

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
BSD 3-Clause "New" or "Revised" License
527 stars 180 forks source link

Running iterative alg stuck in Ubuntu system #552

Open stefenmax opened 1 month ago

stefenmax commented 1 month ago

Hi, I run your code smoothly on Windows, when I transfer to linux, after compile, it could run the forward and backprojection on my data. But every time When I run OSART-TV like below, it will stuck with no response. In windows it give me response within few seconds. algs.ossart_tv(proj, self.geo, angles, niter=1, init = init) Thanks for your help

Specifications

AnderBiguri commented 1 month ago

There are a couple of rare issues that may be causing this, but its been hard to debug because I can't reproduce it.

One thing to try: in the following function, a new geoemtry is created from the input one.

https://github.com/CERN/TIGRE/blob/2b18d9bb489d2fb46ac87f58fbfcfd9981bb2ec6/Python/tigre/algorithms/iterative_recon_alg.py#L217

Can you try changing the code locally so it doesn't do this modification of the geoemtry? Just the copy.

stefenmax commented 1 month ago

Do you mean comment this line right? I tried and failed. But I tried some Krylov subspace algorithms like CGLS and LSQR it worked, That is weired. But the OSART-TV's performence is the best...

AnderBiguri commented 1 month ago

@stefenmax not just that line, but the few after. Apologies I am in a trip so can't help much, but the idea is to pass an un modified geo to Atb

stefenmax commented 1 month ago

Thanks for you help. But it still didn't works. Maybe I should run it using windows. And I found that the speed is faster than linux lol

AnderBiguri commented 1 month ago

hum... I don't really know then why. As I can not reproduce I would need to know which function hangs, is there any way you can try to figure that out? I have extensively used TIGRE in Linux, so its certainly a specific case of geometry, CUDA, number of GPUS, OS, python version or something like that that causes this strange error, but its hard for me to figure out simply because I don't see it.

I'll keep the issue open, if you do happen to pinpoint what exactly hangs (has to be some Ax() or Atb() call somewhere) do let me know. I do suspect its set_w or set_v that hang...

stefenmax commented 1 month ago

I found that I can run the ossart algogrithm in the example.py in my linux system. So I tried replace my geometry using the head phantom and found it hangg in the tigre.Ax. That is weired cause previously I could do the Ax and FDK for my own data. Here is the example code, I don't know if you can reproduce this.

from __future__ import division
from __future__ import print_function

import numpy as np
import tigre
import tigre.algorithms as algs
from tigre.utilities import sample_loader
from tigre.utilities.Measure_Quality import Measure_Quality
import tigre.utilities.gpu as gpu
import matplotlib.pyplot as plt
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
### This is just a basic example of very few TIGRE functionallity.
# We hihgly recomend checking the Demos folder, where most if not all features of tigre are demoed.

listGpuNames = gpu.getGpuNames()
if len(listGpuNames) == 0:
    print("Error: No gpu found")
else:
    for id in range(len(listGpuNames)):
        print("{}: {}".format(id, listGpuNames[id]))

gpuids = gpu.getGpuIds(listGpuNames[0])
print(gpuids)

# Geometry
# geo1 = tigre.geometry(mode='cone', high_resolution=False, default=True)
img_size = 256
geo = tigre.geometry(mode="cone")
geo.DSD = 950
geo.DSO = 540
geo.nDetector = np.array([1, 835]) 
geo.dDetector = np.array([1, 0.9643345*950 / 835])
geo.sDetector = geo.dDetector * geo.nDetector
geo.nVoxel = np.array([1, img_size, img_size])
geo.sVoxel = geo.nVoxel
geo.dVoxel = geo.sVoxel / geo.nVoxel 
geo.accuracy=0.5  
angles = np.linspace(0, np.pi/2, 180, dtype=np.float32)
# Prepare projection data
head = sample_loader.load_head_phantom(geo.nVoxel)
breakpoint()
proj = tigre.Ax(head, geo, angles, gpuids=gpuids)
test = tigre.Atb(proj,geo,angles,backprojection_type="matched",gpuids=gpuids)
# Reconstruct
niter = 20
fdkout = algs.fdk(proj, geo, angles, gpuids=gpuids)
breakpoint()
ossart = algs.ossart(proj, geo, angles, niter, blocksize=20, gpuids=gpuids)

# Measure Quality
# 'RMSE', 'MSSIM', 'SSD', 'UQI'
print("RMSE fdk:")
print(Measure_Quality(fdkout, head, ["nRMSE"]))
print("RMSE ossart")
print(Measure_Quality(ossart, head, ["nRMSE"]))

# Plot
fig, axes = plt.subplots(3, 2)
axes[0, 0].set_title("FDK")
axes[0, 0].imshow(fdkout[geo.nVoxel[0] // 2])
axes[1, 0].imshow(fdkout[:, geo.nVoxel[1] // 2, :])
axes[2, 0].imshow(fdkout[:, :, geo.nVoxel[2] // 2])
axes[0, 1].set_title("OS-SART")
axes[0, 1].imshow(ossart[geo.nVoxel[0] // 2])
axes[1, 1].imshow(ossart[:, geo.nVoxel[1] // 2, :])
axes[2, 1].imshow(ossart[:, :, geo.nVoxel[2] // 2])
plt.show()
# tigre.plotProj(proj)
# tigre.plotImg(fdkout)
AnderBiguri commented 1 month ago

So it hangs in the Ax in this code? What if you make a different amount of GPUs visible? Are they all the same GPU?

stefenmax commented 1 month ago

yeah, it hangs in the Ax. No it was not the same GPU. But in my another server, there are two same GPU. And it hangs in the same position. image

AnderBiguri commented 1 month ago

Certainly with different GPUs behaviour is undefined, so that would be an issue.

I'll try your specific geometry. But out of curiosity, if you change the nvoxel/ndetector a bit, does it still hang?

stefenmax commented 1 month ago

Do you have any recommendation on how to change the nvoxel/ndetector?

AnderBiguri commented 1 month ago

Just give it a different value, just to see if its the specific values causing the issue.

stefenmax commented 1 month ago

Yes,after change it a bit. Still hang

AnderBiguri commented 3 weeks ago

Apologies, I don't seem to be able to reproduce this in any way. If you can pinpoint where the error is, do let me know.