CERN / TIGRE

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
BSD 3-Clause "New" or "Revised" License
529 stars 180 forks source link

Some backprjections/FDK result in an unexpected CUDA error #492

Open AnderBiguri opened 9 months ago

AnderBiguri commented 9 months ago

Still unsure which geos cause this, but I get "main loop fail" for some geos. If it is a prooblem of the geo being wrong, this should be caugth earlier, otherwise there is a mayor bug somewhere. I wonder in the sizes/ints are being properly passed since the latest changes

tsadakane commented 9 months ago

Could this have something to do with e7ad230823514c56375ddd122f88daf35abc81ef ?

AnderBiguri commented 9 months ago

@tsadakane possibly, that is what I thought, but my preliminary test don't seem to show any mayor change. I just need some time to sit down and print the right things.

AnderBiguri commented 9 months ago

Hi @tsadakane I wonder if this has to do with the particulars of the computer I am running this at, a 4 GPU station. Playing with the GpuId() class, I realize that we do have a way to select al GPUs with the same name, but can we actually just select a GPU given its Id? I think not, or my brain is a bit too tired today to figure out how. Am I too tired, or we can't do this?

AnderBiguri commented 9 months ago

I have more or less found the issue, and it resides in the difference between these two enviroments. I need to check what exactly is the breaking one (for tomorrow....)

Working:

#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.12.0                   pypi_0    pypi
cython                    3.0.2                    pypi_0    pypi
fonttools                 4.43.0                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
libnsl                    2.0.0                hd590300_1    conda-forge
libsqlite                 3.43.0               h2797004_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
matplotlib                3.8.0                    pypi_0    pypi
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.26.0                   pypi_0    pypi
openssl                   3.1.3                hd590300_0    conda-forge
packaging                 23.2                     pypi_0    pypi
pillow                    10.0.1                   pypi_0    pypi
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.1                    pypi_0    pypi
python                    3.11.3          h2755cc3_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
pytigre                   2.4.0                    pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
scipy                     1.11.3                   pypi_0    pypi
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
tk                        8.6.13               h2797004_0    conda-forge
tqdm                      4.66.1                   pypi_0    pypi
tzdata                    2023c                h71feb2d_0    conda-forge
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge

Not working:

#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
alsa-lib                  1.2.8                h166bdaf_0    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
blas                      1.0                         mkl    conda-forge
brotli                    1.0.9                h166bdaf_9    conda-forge
brotli-bin                1.0.9                h166bdaf_9    conda-forge
brotli-python             1.0.9           py310hd8f1fbe_9    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
cairo                     1.16.0            ha61ee94_1014    conda-forge
certifi                   2023.7.22          pyhd8ed1ab_0    conda-forge
charset-normalizer        3.2.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.1.0                    pypi_0    pypi
cuda-cudart               11.7.99                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
cycler                    0.11.0                   pypi_0    pypi
cython                    0.29.28         py310hd8f1fbe_2    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
fftw                      3.3.10          nompi_hc118613_108    conda-forge
filelock                  3.12.4             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.42.0                   pypi_0    pypi
freetype                  2.12.1               h267a509_2    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
glib                      2.78.0               hfc55251_0    conda-forge
glib-tools                2.78.0               hfc55251_0    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gmpy2                     2.1.2           py310h3ec546c_1    conda-forge
gnutls                    3.6.13               h85f3911_1    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gst-plugins-base          1.22.0               h4243ec0_2    conda-forge
gstreamer                 1.22.0               h25f0c4b_2    conda-forge
gstreamer-orc             0.4.34               hd590300_0    conda-forge
harfbuzz                  6.0.0                h8e241bc_0    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
jack                      1.9.22               h11f4161_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4                    pypi_0    pypi
krb5                      1.20.1               h81ceb04_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.15                 hfd0df8a_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0            16_linux64_mkl    conda-forge
libbrotlicommon           1.0.9                h166bdaf_9    conda-forge
libbrotlidec              1.0.9                h166bdaf_9    conda-forge
libbrotlienc              1.0.9                h166bdaf_9    conda-forge
libcap                    2.67                 he9d0100_0    conda-forge
libcblas                  3.9.0            16_linux64_mkl    conda-forge
libclang                  15.0.7          default_h7634d5b_3    conda-forge
libclang13                15.0.7          default_h9986a30_3    conda-forge
libcublas                 11.10.3.66                    0    nvidia
libcufft                  10.7.2.124           h4fbf590_0    nvidia
libcufile                 1.7.2.10                      0    nvidia
libcups                   2.3.3                h36d4200_3    conda-forge
libcurand                 10.3.3.141                    0    nvidia
libcusolver               11.4.0.1                      0    nvidia
libcusparse               11.7.4.91                     0    nvidia
libdb                     6.2.32               h9c3ff4c_0    conda-forge
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.3                h59595ed_0    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgcrypt                 1.10.1               h166bdaf_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.78.0               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgpg-error              1.47                 h71f35ed_0    conda-forge
libhwloc                  2.9.1                hd6dc26d_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
liblapack                 3.9.0            16_linux64_mkl    conda-forge
libllvm15                 15.0.7               hadd5161_1    conda-forge
libnpp                    11.7.4.75                     0    nvidia
libnsl                    2.0.0                h7f98852_0    conda-forge
libnvjpeg                 11.8.0.2                      0    nvidia
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     15.3                 hbcd7760_1    conda-forge
libsndfile                1.2.2                hbc2eb40_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libsystemd0               253                  h8c4010b_1    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libtool                   2.4.7                h27087fc_0    conda-forge
libudev1                  253                  h0b41bf4_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.5.0                h79f4944_1    conda-forge
libxml2                   2.10.3               hca2bb57_4    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvm-openmp               16.0.6               h4dfa4b3_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markupsafe                2.1.3           py310h2372a71_0    conda-forge
matplotlib                3.7.2                    pypi_0    pypi
matplotlib-base           3.8.0           py310h62c0568_0    conda-forge
mkl                       2022.2.1         h84fe81f_16997    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.0                hb012696_0    conda-forge
mpg123                    1.31.3               hcb278e6_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.33               hf1915f5_4    conda-forge
mysql-libs                8.0.33               hca2cd23_4    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
nettle                    3.6                  he412f7d_0    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.92                 h1d7d5a4_0    conda-forge
numpy                     1.26.0          py310hb13e2d6_0    conda-forge
openh264                  2.1.1                h780b84a_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.3                hd590300_0    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pathlib                   1.0.1           py310hff52083_7    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    10.0.0                   pypi_0    pypi
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.10.0             pyhd8ed1ab_0    conda-forge
ply                       3.11                       py_1    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pulseaudio                16.1                 hcb278e6_3    conda-forge
pulseaudio-client         16.1                 h5195f5e_3    conda-forge
pulseaudio-daemon         16.1                 ha8d29e2_3    conda-forge
pyparsing                 3.0.9                    pypi_0    pypi
pyqt                      5.15.9          py310h04931ad_4    conda-forge
pyqt5-sip                 12.12.2         py310hc6cd4ac_4    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.12         hd12c33a_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python_abi                3.10                    3_cp310    conda-forge
pytigre                   2.4.0                    pypi_0    pypi
pytorch                   2.0.1           py3.10_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pyyaml                    6.0             py310h5764c6d_5    conda-forge
qt-main                   5.15.8               h5d23da1_6    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
scipy                     1.11.1          py310ha4c1d20_0    conda-forge
setuptools                68.0.0             pyhd8ed1ab_0    conda-forge
sip                       6.7.11          py310hc6cd4ac_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sympy                     1.12            pypyh9d50eac_103    conda-forge
tbb                       2021.9.0             hf52228f_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torchaudio                2.0.2               py310_cu117    pytorch
torchtriton               2.0.0                     py310    pytorch
torchvision               0.15.2              py310_cu117    pytorch
tornado                   6.3.3           py310h2372a71_0    conda-forge
tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
typing-extensions         4.7.1                hd8ed1ab_0    conda-forge
typing_extensions         4.7.1              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   2.0.4              pyhd8ed1ab_0    conda-forge
wheel                     0.41.0             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.0                h516909a_0    conda-forge
xcb-util-image            0.4.0                h166bdaf_0    conda-forge
xcb-util-keysyms          0.4.0                h516909a_0    conda-forge
xcb-util-renderutil       0.3.9                h166bdaf_0    conda-forge
xcb-util-wm               0.4.1                h516909a_0    conda-forge
xkeyboard-config          2.38                 h0b41bf4_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.4                h0b41bf4_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xf86vidmodeproto     2.3.1             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge
tsadakane commented 9 months ago

@AnderBiguri ,

we do have a way to select all GPUs with the same name,

Yes, we do.

but can we actually just select a GPU given its Id?

Yes, we can. If the IDs of the name "XXX" were (0,1,2,3) and we want to use only ID=1, I think it is possible by setting something like this:

gpuids.devices = (int32(1))
AnderBiguri commented 9 months ago

In nay case, its still nor clear to me what causes the error. I upgraded the non-working env to cython 3 and that still causes errors.