JuliaGPU / oneAPI.jl

Julia support for the oneAPI programming toolkit.
https://juliagpu.org/oneapi/
Other
179 stars 21 forks source link

Multiplication of StridedMaybeAdjOrTransMat broken for certain matrix sizes #442

Open leios opened 3 months ago

leios commented 3 months ago

If the size of the array is ~ 10, then a' * a works fine.

julia> using oneAPI

julia> rand_array = rand(Float32, 10, 2);

julia> one_array = oneArray(rand_array);

julia> rand_array' * rand_array
2×2 Matrix{Float32}:
 3.73734  2.68277
 2.68277  3.32426

julia> one_array' * one_array
2×2 oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}:
 3.73734  2.68277
 2.68277  3.32426

If it is 100, it fails:

julia> rand_array = rand(Float32, 100, 2);

julia> rand_array' * rand_array
2×2 Matrix{Float32}:
 32.107   24.3659
 24.3659  32.234

julia> one_array = oneArray(rand_array);

julia> one_array' * one_array
2×2 oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}:
 0.0  0.0
 0.0  0.0

It seems to be calling this function in LinearAlgebra/matmul.jl:

function (*)(A::StridedMaybeAdjOrTransMat{<:BlasReal}, B::StridedMaybeAdjOrTransMat{<:BlasReal})
    TS = promote_type(eltype(A), eltype(B))
    mul!(similar(B, TS, (size(A, 1), size(B, 2))),
         wrapperop(A)(convert(AbstractArray{TS}, _unwrap(A))),
         wrapperop(B)(convert(AbstractArray{TS}, _unwrap(B))))
end

segfault on close:

[982661] signal (11.128): Segmentation fault
in expression starting at none:0
_ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE16processResidencyERKSt6vectorIPNS_18GraphicsAllocationESaIS5_EEj at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE13flushInternalERKNS_11BatchBufferERKSt6vectorIPNS_18GraphicsAllocationESaIS8_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE5flushERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS7_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO21CommandStreamReceiver17submitBatchBufferERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS5_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L015CommandQueueImp17submitBatchBufferEmRSt6vectorIPN3NEO18GraphicsAllocationESaIS4_EEPvb at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY18EE26executeCommandListsRegularERNS2_27CommandListExecutionContextEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tP18_ze_event_handle_tjPSB_ at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY18EE19executeCommandListsEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tbP18_ze_event_handle_tjPS9_ at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L033zeCommandQueueExecuteCommandListsEP26_ze_command_queue_handle_tjPP25_ze_command_list_handle_tP18_ze_fence_handle_t at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN18ur_queue_handle_t_18executeCommandListENSt3__119__hash_map_iteratorINS0_15__hash_iteratorIPNS0_11__hash_nodeINS0_17__hash_value_typeIP25_ze_command_list_handle_t22ur_command_list_info_tEEPvEEEEEEbb at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
_ZN18ur_queue_handle_t_26executeAllOpenCommandListsEv at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
urQueueRelease at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
piQueueRelease at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
_ZNK4sycl3_V16detail6plugin12call_nocheckILNS1_9PiApiKindE26EJP9_pi_queueEEE10_pi_resultDpT0_ at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail10queue_implD2Ev at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libsycl.so.7 (unknown line)
_M_release at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:161 [inlined]
~__shared_count at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:712 [inlined]
~__shared_ptr at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:1151 [inlined]
~queue at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/sycl/queue.hpp:119 [inlined]
~syclQueue_st at /workspace/srcdir/oneAPI.jl/deps/src/sycl.hpp:19 [inlined]
syclQueueDestroy at /workspace/srcdir/oneAPI.jl/deps/src/sycl.cpp:60
syclQueueDestroy at /home/u222842/projects/oneAPI.jl/lib/support/liboneapi_support.jl:58 [inlined]
#7 at /home/u222842/projects/oneAPI.jl/lib/sycl/SYCL.jl:74
unknown function (ip: 0x7faf4714b085)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
run_finalizer at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:454
ijl_atexit_hook at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/init.c:299
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:732
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7faf5ed83d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 14926321 (Pool: 14908017; Big: 18304); GC: 23
Segmentation fault (core dumped)

I am using the intel devcloud for this and

pbsnodes | grep -B4 gpu
---
s019-n010
     state = job-exclusive
     power_state = Running
     np = 2
     properties = core,tgl,i9-11900kb,ram32gb,netgbe,gpu,gen11
---

This seems related to the issue I have been having with https://github.com/JuliaGPU/GPUArrays.jl/pull/525

maleadt commented 3 months ago

Which GPU? Are you using oneAPI.jl#master?

cc @pengtu

leios commented 3 months ago

yes, I was on master. I was using an i9-11900kb, with:

julia> device()
ZeDevice(GPU, vendor 0x8086, device 0x9a60): Intel(R) UHD Graphics

As an interesting note, this error did not occur on another node with:

julia> device()
ZeDevice(GPU, vendor 0x8086, device 0x3e96): Intel(R) UHD Graphics P630

I'll investigate it some more tomorrow

maleadt commented 3 months ago

Unless it's a simple double-free from the Julia side, I think this may be hard to investigate. There's been several related issue because of the MKL/Level0 interop, see e.g. https://github.com/JuliaGPU/oneAPI.jl/pull/417#issuecomment-2056988520.