flatironinstitute / finufft

Non-uniform fast Fourier transform library of types 1,2,3 in dimensions 1,2,3
Other
285 stars 72 forks source link

Python: Can cufinufft automatically figure out `gpu_device_id`? #420

Open WardBrian opened 7 months ago

WardBrian commented 7 months ago

Originally reported downstream: https://github.com/flatironinstitute/pytorch-finufft/issues/103

The following will segfault with either a Fatal Python error: aborted or Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)

import numpy as np
import torch
import cufinufft

data = torch.view_as_complex(
    torch.stack((torch.randn(15, 80, 12000), torch.randn(15, 80, 12000)), dim=-1)
)
omega = torch.rand(2, 12000) * 2 * np.pi - np.pi

cufinufft.nufft2d1(
            *omega.to("cuda:1"),
            data.reshape(-1, 12000).to("cuda:1"),
            (320,320),
            isign=-1,
        )

If you change to cuda:0 for both arrays it seems to work fine.

The full error I get is

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  exclusive_scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Fatal Python error: Aborted

Current thread 0x000015555552c4c0 (most recent call first):
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_plan.py", line 236 in setpts
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_simple.py", line 38 in _invoke_plan
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_simple.py", line 12 in nufft2d1
  File "/mnt/home/bward/finufft/finufft/mwe.py", line 14 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 20)
Aborted (core dumped)
lu1and10 commented 7 months ago

Does it also break if you specify the device id explicitly in the kwargs? e.g.

cufinufft.nufft2d1(
            *omega.to("cuda:1"),
            data.reshape(-1, 12000).to("cuda:1"),
            (320,320),
            isign=-1,
            gpu_device_id=1
        )
WardBrian commented 7 months ago

@lu1and10 no, that seems to have fixed it (sorry for not chasing through enough **kwarg doc to find that option).

So this issue can be re-worded as a feature request: can _compat.py pick up a reasonable default for gpu_device_id?

lu1and10 commented 7 months ago

@lu1and10 no, that seems to have fixed it (sorry for not chasing through enough **kwarg doc to find that option).

So this issue can be re-worded as a feature request: can _compat.py pick up a reasonable default for gpu_device_id?

Yes, I guess so. It will be a nice feature that device can be inferred from inputs.