Steffenhir / GraXpert

GraXpert is an astronomical image processing program for extracting and removing gradients from the background of your astrophotos.
https://www.graxpert.com/
GNU General Public License v3.0
144 stars 13 forks source link

AMD gpu update #168

Open stevelcb opened 2 weeks ago

stevelcb commented 2 weeks ago

Ubuntu 22.04

Hi everyone I thought I'd update on this having tried to get ROCm gpu acceleration recognised via onnx.

We created the environment for building GraX as here: https://github.com/Steffenhir/GraXpert

Then activated AMD's ROCm as here: https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-onnx.html

That works fine and AMD's ROCm is indeed available via onnxruntime:


>>> import onnxruntime as ort
>>> ort.get_available_providers()
['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']

We then build... However, GraX still sees only the CPU:


2024-06-15 12:45:13,787 MainProcess root INFO     Starting denoising
2024-06-15 12:45:15,548 MainProcess root INFO     Available inference providers : ['CPUExecutionProvider']
2024-06-15 12:45:15,548 MainProcess root INFO     Used inference providers : ['CPUExecutionProvider']
2024-06-15 12:45:17,962 MainProcess root INFO     Progress: 1%
2024-06-15 12:45:19,893 MainProcess root INFO     Progress: 2%

Reading to the end of the AMD document, I see that it works with: Radeon: RX 7900 XTX, RX 7900 XT, RX 7900, GRE PRO W7900 and PRO W7800

I have a gfx90, so not sure if the gpu will be visible to GraX. It is to other programs, such as StarTools but that's via opencl.

Still thinking... Any ideas anyone? Cheers and TIA

schmelly commented 1 week ago

Hi,

I believe you have to include the ROCMExecutionProvider in graxpert/ai_model_handling.py, cf.: https://github.com/Steffenhir/GraXpert/blob/593cdddf9b3a633e63b7cd45a86903e03b09c89b/graxpert/ai_model_handling.py#L169

CS, David

stevelcb commented 1 week ago

Thanks David Unfortunately:


 python -m graxpert.main
2024-06-18 22:13:02,761 MainProcess root WARNING  Could not check for newest version
2024-06-18 22:13:11,367 ForkProcess-2 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-3 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-4 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-2 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-3 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-4 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,822 ForkProcess-2 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,822 ForkProcess-2 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-3 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-4 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-3 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-4 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:24,273 MainProcess root INFO     Progress: 8%
2024-06-18 22:13:24,278 MainProcess root INFO     Progress: 16%
2024-06-18 22:13:24,280 MainProcess root INFO     Progress: 24%
2024-06-18 22:13:24,280 MainProcess root INFO     Progress: 32%
2024-06-18 22:13:25,119 MainProcess root INFO     Providers : ['ROCMExecutionProvider', 'CPUExecutionProvider']
2024-06-18 22:13:25,119 MainProcess root INFO     Used providers : ['ROCMExecutionProvider', 'CPUExecutionProvider']
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
2024-06-18 22:13:25.122326547 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
2024-06-18 22:13:25.122341807 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'StatefulPartitionedCall/model/sequential/conv2d/Conv2D__6' Status Message: ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
2024-06-18 22:13:25,177 MainProcess root ERROR    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'StatefulPartitionedCall/model/sequential/conv2d/Conv2D__6' Status Message: ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
Traceback (most recent call last):
  File "/home/steve/GraXpert/graxpert/application/app.py", line 149, in on_calculate_request
    extract_background(
  File "/home/steve/GraXpert/graxpert/background_extraction.py", line 80, in extract_background
    background = session.run(None, {"gen_input_image": np.expand_dims(imarray_shrink, axis=0)})[0][0]
  File "/home/steve/GraXpert/graxpert-env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'StatefulPartitionedCall/model/sequential/conv2d/Conv2D__6' Status Message: ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
stevelcb commented 1 week ago

log attached graxpert.log.5.txt