Autostronomy / AstroPhot

A fast, flexible, automated, and differentiable astronomical image 2D forward modelling tool for precise parallel multi-wavelength photometry
https://astrophot.readthedocs.io
GNU General Public License v3.0
79 stars 9 forks source link

Issues with AMD GPU and ROCm support #102

Closed markliuchina closed 11 months ago

markliuchina commented 1 year ago

Hi, Thanks for providing such wonderful code. I am using AMD RX6400 and want to use the dGPU to accelerate the computation. Unfortunately, even I have prepared all the prerequisites (the AMD driver, rocm support, pytorch for ROCm and basic setting for renderer and so on), when I select the "cuda:0" as the device for computation, the python kernel crashes every time when using the functions from AutoPhot. I am sure the ROCm and pytorch are all set well, because I can still use the Stable Diffusion for AI picture generation. My current work need to measure the half-light radii of a sample of 2000 galaxies thus the usage of GALFIT could be laborious. I was wondering if your code could allow me to write an automatic programe to do this job. It takes time to use MPI for parallel computation. Consequently, I want to see how a GPU can help reducing the computation time in my work. Anyway, I do not have a Nvidia GPU. Did you ever test your code on an AMD GPU system? Last, thanks again for making such wonderful code public!

Best.

Mingfeng Liu

ConnorStoneAstro commented 1 year ago

Hi Mingfeng,

Hmm, in principle it should work on any CUDA enabled GPU. So far I have only tested on NVIDIA V100 and A100 because that's what I have available. I have two requests. First, could you update to the latest version 0.10.3? I included some updates in that version to make running on GPU work better. Second, if that doesn't fix it could you include the full error trace from python? That way I can get some information about what's going wrong.

Best, Connor

markliuchina commented 11 months ago

Hi Mingfeng,

Hmm, in principle it should work on any CUDA enabled GPU. So far I have only tested on NVIDIA V100 and A100 because that's what I have available. I have two requests. First, could you update to the latest version 0.10.3? I included some updates in that version to make running on GPU work better. Second, if that doesn't fix it could you include the full error trace from python? That way I can get some information about what's going wrong.

Best, Connor

Hi, Connor! Glad to get your reply! First, thanks for your advice! And sorry for the huge delay (I was then busy with a summer school in Peking Univ.) In my previous experiments, not only in AutoPhot, but in every code with pytorch used, the "Segmentation Fault (Core Dumped)" issue exists! (As a results, it is not a problem/BUG of AutoPhot) I think I found the solution. The AMD said the newer version of ROCm is always compatible with the older version. (Anyway, for AMD driver, this is not convincible) In previous tests, I installed the AMD driver with ROCm == 5.6 and built pytorch for ROCm == 5.2. The 'torch.cuda.is_available()' returns 'True' and the pytorch can still identify the name and driver of my graphic card. Even rebuilding the pytorch to the version with ROCm == 5.4 doesn't work. I uninstalled the AMD driver and the ROCM 5.6 and then install the AMD driver with ROCM == 5.2 and then build the stable version of pytorch with ROCm support == 5.2. (Other user may need to add the current user logname to the 'render' group, this page can solve other issues with PyTorch/ROCm on AMD dGPU: #ROCm/issues/1930) Now the AutoPhot can function well with AMD dGPU. I have just finished GALFIT measurement of my sample. (laborious indeed) And the AutoPhot indeed provide a faster and stable measurement (anyway, convenient), and even with RX 6400, it can still save some computation time for complex model. Thanks again for your code and your help!

Best.

Mingfeng Liu

(NNU)

ConnorStoneAstro commented 11 months ago

Hi Mingfeng,

I'm so glad it is working now! Thanks for detailing the solution, other AMD users may find this very helpful.

Good luck on your project!

Best, Connor