JuliaGPU / oneAPI.jl

Julia support for the oneAPI programming toolkit.
https://juliagpu.org/oneapi/
Other
182 stars 22 forks source link

oneMKL.jl tests failed on PVC using 2024.2 #454

Open kballeda opened 2 months ago

kballeda commented 2 months ago

Observed a crash on PVC with latest oneAPI 2024.2.0. I believe this https://github.com/JuliaGPU/oneAPI.jl/commit/457e020c7cf2ec56a8098008b6f4a062b69111c7 must cover various cases. If the failed tests are known it would be good to disable them for PVC target.

~/projects/JULIA/oneAPI.jl$ $JULIA --project -L test/setup.jl test/onemkl.jl
reflect: Test Failed at /home/kali/projects/JULIA/oneAPI.jl/test/onemkl.jl:45
  Expression: testf(reflect!, rand(T, m), rand(T, m), rand(real(T)), rand(T))

Stacktrace:
 [1] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:45 [inlined]
 [3] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:44 [inlined]
 [5] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1669 [inlined]
 [6] macro expansion
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:17 [inlined]
 [7] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [8] top-level scope
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:17
scal: Test Failed at /home/kali/projects/JULIA/oneAPI.jl/test/onemkl.jl:51
  Expression: testf(rmul!, rand(T, m), alpha[1])

Stacktrace:
 [1] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:51 [inlined]
 [3] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:50 [inlined]
 [5] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1669 [inlined]
 [6] macro expansion
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:17 [inlined]
 [7] macro expansion
   @ ~/kali_tools/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [8] top-level scope
   @ ~/projects/JULIA/oneAPI.jl/test/onemkl.jl:17

[1290059] signal (11.1): Segmentation fault
in expression starting at /home/kali/projects/JULIA/oneAPI.jl/test/onemkl.jl:1
unknown function (ip: 0x7f87b18019b4)
unknown function (ip: 0x7f87b1801359)
unknown function (ip: 0x7f87b1714e2f)
unknown function (ip: 0x7f87b185b5ab)
unknown function (ip: 0x7f87b1826430)
unknown function (ip: 0x7f87b14b9972)
unknown function (ip: 0x7f87b13a122a)
_ZN18ur_queue_handle_t_18executeCommandListENSt3__119__hash_map_iteratorINS0_15__hash_iteratorIPNS0_11__hash_nodeINS0_17__hash_value_typeIP25_ze_command_list_handle_t22ur_command_list_info_tEEPvEEEEEEbb at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libpi_level_zero.so (unknown line)
urEnqueueKernelLaunch at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libpi_level_zero.so (unknown line)
piEnqueueKernelLaunch at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libpi_level_zero.so (unknown line)
_ZNK4sycl3_V16detail6plugin12call_nocheckILNS1_9PiApiKindE78EJP9_pi_queueP10_pi_kernelmPmS9_S9_mPP9_pi_eventSC_EEE10_pi_resultDpT0_ at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail16enqueueImpKernelERKSt10shared_ptrINS1_10queue_implEERNS1_8NDRDescTERSt6vectorINS1_7ArgDescESaISA_EERKS2_INS1_18kernel_bundle_implEERKS2_INS1_11kernel_implEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERS9_IP9_pi_eventSaISV_EERKS2_INS1_10event_implEERKSt8functionIFPvPNS1_16AccessorImplHostEEE23_pi_kernel_cache_configb at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail13ExecCGCommand15enqueueImpQueueEv at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail7Command7enqueueERNS1_14EnqueueResultTENS1_9BlockingTERSt6vectorIPS2_SaIS7_EE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail9Scheduler14GraphProcessor14enqueueCommandEPNS1_7CommandERSt11shared_lockISt18shared_timed_mutexERNS1_14EnqueueResultTERSt6vectorIS5_SaIS5_EES5_NS1_9BlockingTE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail9Scheduler19enqueueCommandForCGESt10shared_ptrINS1_10event_implEERSt6vectorIPNS1_7CommandESaIS8_EENS1_9BlockingTE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail9Scheduler5addCGESt10unique_ptrINS1_2CGESt14default_deleteIS4_EERKSt10shared_ptrINS1_10queue_implEEP22_pi_ext_command_bufferRKSt6vectorIjSaIjEE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V17handler8finalizeEv at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail10queue_impl15finalizeHandlerINS0_7handlerEEEvRT_RNS0_5eventE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail10queue_impl11submit_implERKSt8functionIFvRNS0_7handlerEEERKSt10shared_ptrIS2_ESD_SD_RKNS1_13code_locationEPKS3_IFvbbRNS0_5eventEEE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail10queue_impl6submitERKSt8functionIFvRNS0_7handlerEEERKSt10shared_ptrIS2_ERKNS1_13code_locationEPKS3_IFvbbRNS0_5eventEEE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V15queue11submit_implESt8functionIFvRNS0_7handlerEEERKNS0_6detail13code_locationE at /usr/DPA/tools/oneAPI/2024.2.0/compiler/2024.2/lib/libsycl.so.7 (unknown line)
_ZN6oneapi3mkl3gpu19snrm2_sycl_internalEPN4sycl3_V15queueElPKflPfRKSt6vectorINS3_5eventESaISA_EE at /usr/DPA/tools/oneAPI/2024.2.0/mkl/2024.2/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl3gpu10snrm2_syclEPN4sycl3_V15queueElPKflPfRKSt6vectorINS3_5eventESaISA_EE at /usr/DPA/tools/oneAPI/2024.2.0/mkl/2024.2/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl4blas5snrm2ERN4sycl3_V15queueElPKflPfRKSt6vectorINS3_5eventESaISA_EE at /usr/DPA/tools/oneAPI/2024.2.0/mkl/2024.2/lib/libmkl_sycl_blas.so.4 (unknown line)
_ZN6oneapi3mkl4blas12column_major4nrm2ERN4sycl3_V15queueElPKflPfRKSt6vectorINS4_5eventESaISB_EE at /usr/DPA/tools/oneAPI/2024.2.0/mkl/2024.2/lib/libmkl_sycl_blas.so.4 (unknown line)
onemklSnrm2 at /workspace/srcdir/oneAPI.jl/deps/src/onemkl.cpp:1286
onemklSnrm2 at /home/kali/projects/JULIA/oneAPI.jl/lib/support/liboneapi_support.jl:1553
unknown function (ip: 0x7f8b815b977f)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
nrm2 at /home/kali/projects/JULIA/oneAPI.jl/lib/mkl/wrappers_blas.jl:591
norm at /home/kali/projects/JULIA/oneAPI.jl/lib/mkl/linalg.jl:21
unknown function (ip: 0x7f8b815b94d5)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:768
#compare#10 at /home/kali/.julia/packages/GPUArrays/bbZD0/test/testsuite.jl:44
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:768
compare at /home/kali/.julia/packages/GPUArrays/bbZD0/test/testsuite.jl:38
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:768
#testf#1 at /home/kali/projects/JULIA/oneAPI.jl/test/setup.jl:11
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:768
testf at /home/kali/projects/JULIA/oneAPI.jl/test/setup.jl:11
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:617
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:544
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./Base.jl:495
jfptr_include_46393.1 at /home/kali/kali_tools/julia-1.10.4/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr__start_82729.1 at /home/kali/kali_tools/julia-1.10.4/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7f8b99052d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 7462318 (Pool: 7457917; Big: 4401); GC: 11
Segmentation fault (core dumped)
amontoison commented 2 months ago

@kballeda We don't have any CI build with PVC architecture. It will be also useful for testing routines based on double precision.

I can only test locally on my laptop with an Intel HD graphics 3000 when I update oneAPI.jl. If you have a remote cluster where you can allow us to connect, I can try the last modifications before merging them here in oneAPI.jl.

kballeda commented 2 months ago

@kballeda We don't have any CI build with PVC architecture. It will be also useful for testing routines based on double precision.

I can only test locally on my laptop with an Intel HD graphics 3000 when I update oneAPI.jl. If you have a remote cluster where you can allow us to connect, I can try the last modifications before merging them here in oneAPI.jl.

@maleadt It would be great if we could use the DevCloud instance provided last week to debug this.

maleadt commented 2 months ago

PVC support is blocked on https://github.com/JuliaGPU/oneAPI.jl/issues/439 first, which I've been debugging for days.