[BUG] Matvis does not work with CUDA 12

rlbyrne commented 1 month ago

Attempting to run matvis with GPUs using CUDA 12 produces this error:

/opt/devel/rbyrne/envs/py310/lib/python3.10/site-packages/pycuda/cuda/pycuda-helpers.hpp(17): error: expected a ";"
    {
    ^

kernel.cu(119): error: identifier "lerp" is undefined
          lerp(Agrid[origin], Agrid[origin + 1], fx),
          ^

kernel.cu(118): error: identifier "lerp" is undefined
      Asrc[pol * nbeam * nsrc + ant * nsrc + src] = lerp(
                                                    ^

At end of source: error: expected a "}"
kernel.cu(1): note #3196-D: to match this "{"
  extern "C" {
             ^

46 errors detected in the compilation of "kernel.cu".
]

I've attempted a workaround by installing CUDA 11 with conda but it hasn't worked. The installations I performed were: conda install -c conda-forge cudatoolkit=11 conda install -c conda-forge pycuda conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia The resulting error traceback is:

pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmp9gepm5pa.cu failed
[command: nvcc --preprocess -arch sm_86 -I/opt/devel/rbyrne/envs/py310/lib/python3.10/site-packages/pycuda/cuda /tmp/tmp9gepm5pa.cu --compiler-options -P]
[stderr:
b'cc1plus: fatal error: cuda_runtime.h: No such file or directory\ncompilation terminated.\n']

steven-murray commented 1 month ago

Thanks @rlbyrne. The issue with CUDA 11 seems to be an environment issue with your system. It would be good to get matvis working with CUDA 12 though, so I will look into that. Unfortunately, this is beyond what my current schedule can handle, so it might be a few weeks. How urgent is this? I wonder if we can find a workaround.

rlbyrne commented 1 month ago

@steven-murray There is some time pressure, so a workaround sounds better than waiting a few weeks. Any ideas how to resolve the environment issue?

piyanatk commented 1 month ago

Hi @rlbyrne . To clarify @steven-murray reply a bit, yes, matvis currently only works with CUDA 11 because the GPU code is developed in that version.

As for your error, it indeed seem to be a compiling issue coming from nvcc. Do you have this command after installing cudatoolkit and pycuda. This stackoverflow post suggests that cudatoolkit installed through conda-forge doesn't have nvcc, but the one install from nvidia channel does. I would check on that.

The version of the CUDA driver on your system also matter. On an HPC, I have to load the correct version of CUDA driver, usually with the module load cuda=11 command, for example.

rlbyrne commented 1 month ago

@piyanatk I'm trying to install cuda via the nvidia channel (conda install cuda -c nvidia), but the environment solve has been hanging for hours.

The module load cuda=11 command errors with message Lmod has detected the following error: The following module(s) are unknown: "cuda=11"

piyanatk commented 1 month ago

@rlbyrne I am not sure about the issue with installing from nvidia channel ... maybe switch to mamba? It solves the environment a lot quicker. On your base environment, just install mamba from conda-forge channel. Then, you can pretty much use it as a replacement for conda

For loading the driver with module, you must check with module avai command first if the software is installed in the system and which versions are available. I also made a mistake in my previous post - module, at least on the HPC I am using, use / to indicate the version, so I would have to do something like, module load cuda/11.8.0_520.61.0, for example.

rlbyrne commented 1 month ago

@piyanatk I don't see any cuda packages under module avail. Running the cuda installation with mamba worked, but I am still getting an error (traceback 71 errors detected in the compilation of "kernel.cu".)

piyanatk commented 1 month ago

@rlbyrne Can you check your cuda version that mamba has installed, mamba list | grep cuda. I am suspecting that it is version 12. Also, do which nvcc to check that it point to the cuda install through mamba and not other ones in the system.

BTW, are you running matvis through hera_sim?

piyanatk commented 1 month ago

@rlbyrne. OK. I got it working without the system CUDA installed.

mamba create -n <env_name> python=3.11
mamba activate <env_name>
mamba install -c nvidia cuda=11 cuda-python=11 cuda-nvcc=11 cuda-toolkit=11
mamba install -c conda-forge numpy=1.26.4 pyuvdata=2.4.5 pycuda
pip install matvis[gpu] hera_sim[gpu] pyuvsim pyradiosky

It turns out that if you only install cuda=11, it doesn't install version 11 of the other required libraries and compiler. Luckily the library is packed into the cuda-python package, so we don't have to manually install version 11 of every library.

Although not cuda related, numpy and pyuvdata versions here are important because we do not support Numpy 2 and pyuvdata 3 yet

@steven-murray We should update the package requirements until we have time to update to CUDA 12. I will make an environment file for GPU installation and put it somewhere, also update the documentation if I can find time.

steven-murray commented 1 month ago

Hey @piyanatk, yes that would be much appreciated! There is a PR open for hera_sim to make it compatible with numpy2 and pyuvdata 3, so hopefully it will be merged inside the week. A GPU environment file would be very useful (even if it does go out of date rather fast).

rlbyrne commented 1 month ago

Ok things are looking promising! I think the job is running. Thank you @piyanatk

rlbyrne commented 1 month ago

Unfortunately running things with the GPUs didn't speed things up at all. A single time and frequency step for the OVRO-LWA took 140 minutes with the GPU setting and 132 minutes without, so actually slower with the GPU setting. Any idea what's going wrong?

piyanatk commented 1 month ago

@rlbyrne Can you please share the command that you use?

rlbyrne commented 1 month ago

@piyanatk I'm just setting use_gpu=True. Is there something else I need to be doing?

The full call is

matvis.simulate_vis(
          ants=antpos,
          fluxes=m3.stokes[0].T.to_value("Jy"),
          ra=ra_new,
          dec=dec_new,
          freqs=freqs,
          lsts=np.array([lst.to_value("rad")]),
          beams=beams,
          beam_idx=beam_ids,
          polarized=True,
          precision=2,
          latitude=location.lat.rad,
          use_gpu=True,
)

steven-murray commented 1 month ago

@rlbyrne I think the bigger question is what is the configuration of your simulation? How many baselines and sky sources/pixels? And also are your beams analytic or UVBeams, and if UVBeams, how many pixels? For some smaller simulations (per time and freq) the overheads are dominant and GPU isn't that useful.

steven-murray commented 1 month ago

You could also try running some line-profiling to see what the dominant bottleneck is. You can also check out the matvis profile command with your setup-size to see if it reflects what you're seeing (though it currently has the limitation that if you're working with a simulated beam model, it only profiles with a low-res model, which under-estimates the beam-interpolation part of the calculation).

piyanatk commented 1 month ago

@rlbyrne I want to also suggest using the wrapper in hera_sim to run matvis. It takes puuvsim style configuration file and works with any array and telescope configurations. The command line tool hera-sim-vis.py also has a build in profiling options.

rlbyrne commented 1 month ago

The simulation has 1 time, 1 frequency, 62,128 baselines, and I would estimate 1,572,864 "sources" (pixels in the hemisphere for a Healpix map with nside 512).

I do suspect the beam size has some impact. I was previously using a lower resolution beam and things were running faster, although I didn't actually profile the speed. I can try downsampling the beam and hope it doesn't have much impact on the result. Does matvis support any other beam formats that could run faster? Like a Gaussian decomposition?

I haven't worked with pyuvsim configuration files. Is that expected to speed things up?

piyanatk commented 1 month ago

Hmm, the GPU should speed things up with that number of sources and baselines based on my testing.

What kind of beam model are you using? Is it an e-field CST beam in a UVBeam file? You can try to specify the order of beam spatial interpolation through the beam_spline_opts keyword, and see if that help. I don't remember exactly what matvis uses by default.

The interface in hera_sim.visibilities does not speed things up, but it is more streamline and unified to work with other simulators, including pyuvsim and fftvis. It is easier to use and to avoid mistakes, in my opinion. If you want a simple Gaussian or Airy beam, that is already available by setting an appropriate beam types in the configuration file. If you want to use a more complex analytic fit of a beam, you will have to write a subclass of AnalyticBeam from pyuvsim.

steven-murray commented 1 month ago

Sorry to be late on this again -- I was on vacation. I agree with @piyanatk that for the simulation size you're using, there should be significant speed up with GPU, so I'm not quite sure what is going on here.

Is your beam on a rectilinear alt/az grid or in healpix? I am assuming the former because currently the GPU version of matvis can't handle the latter. Also, if you're using the same beam for each antenna, make sure you specify only one unique beam in the beam list (and use the same beam index for each antenna).

For this size of simulation with a single unique beam, I wouldn't expect to the beam interpolation to be the bottleneck, but rather the single big matrix multiplication. This should be much faster on the GPU no matter how you slice it.

HERA-Team / matvis

[BUG] Matvis does not work with CUDA 12 #90