Open ymtoo opened 3 months ago
This is tricky to fix.
IIRC one needs to perform a cas
loop for < 4 byte atomics.
Atomix should use https://github.com/JuliaConcurrent/Atomix.jl/blob/e60c518e3ffd2c9d4e96104f16f2a970a69e4289/lib/AtomixCUDA/src/AtomixCUDA.jl#L38
Which does claim to support Float16: https://github.com/JuliaGPU/CUDA.jl/blob/14de0097ff7c26932cc4a175840961cc7d3f396e/src/device/intrinsics/atomics.jl#L195
What is ]status -m
It might be that we end up in https://github.com/JuliaConcurrent/UnsafeAtomicsLLVM.jl instead of UnsafeAtomicsCUDA.jl
This is tricky to fix.
IIRC one needs to perform a
cas
loop for < 4 byte atomics.Atomix should use https://github.com/JuliaConcurrent/Atomix.jl/blob/e60c518e3ffd2c9d4e96104f16f2a970a69e4289/lib/AtomixCUDA/src/AtomixCUDA.jl#L38
Which does claim to support Float16: https://github.com/JuliaGPU/CUDA.jl/blob/14de0097ff7c26932cc4a175840961cc7d3f396e/src/device/intrinsics/atomics.jl#L195
What is
]status -m
x-ref: JuliaGPU/CUDA.jl#1790
(jl_hHeJiL) pkg> status -m Status `/tmp/jl_hHeJiL/Manifest.toml` [621f4979] AbstractFFTs v1.5.0 [79e6a3ab] Adapt v4.0.4 [a9b6321e] Atomix v0.1.0 [ab4f0b2a] BFloat16s v0.5.0 [fa961155] CEnum v0.5.0 [052768ef] CUDA v5.4.2 [1af6417a] CUDA_Runtime_Discovery v0.3.4 [3da002f7] ColorTypes v0.11.5 [5ae59095] Colors v0.12.11 [34da2185] Compat v4.15.0 [a8cc5b0e] Crayons v4.1.1 [9a962f9c] DataAPI v1.16.0 [a93c6f00] DataFrames v1.6.1 [864edb3b] DataStructures v0.18.20 [e2d170a0] DataValueInterfaces v1.0.0 [e2ba6199] ExprTools v0.1.10 [53c48c17] FixedPointNumbers v0.8.5 [0c68f7d7] GPUArrays v10.2.1 [46192b85] GPUArraysCore v0.1.6 ⌃ [61eb1bfa] GPUCompiler v0.26.5 [842dd82b] InlineStrings v1.4.1 [41ab1584] InvertedIndices v1.3.0 [82899510] IteratorInterfaceExtensions v1.0.0 [692b3bcd] JLLWrappers v1.5.0 [63c18a36] KernelAbstractions v0.9.21 ⌅ [929cbde3] LLVM v7.2.1 [8b046642] LLVMLoopInfo v1.0.0 [b964fa9f] LaTeXStrings v1.3.1 [1914dd2f] MacroTools v0.5.13 [e1d29d7a] Missings v1.2.0 [5da4648a] NVTX v0.3.4 [bac558e1] OrderedCollections v1.6.3 [69de0a69] Parsers v2.8.1 [2dfb63ee] PooledArrays v1.4.3 [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [08abe8d2] PrettyTables v2.3.2 [74087812] Random123 v1.7.0 [e6cf234a] RandomNumbers v1.5.3 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [6c6a2e73] Scratch v1.2.1 [91c51154] SentinelArrays v1.4.3 [a2af1166] SortingAlgorithms v1.2.1 [90137ffa] StaticArrays v1.9.6 [1e83bf80] StaticArraysCore v1.4.3 [892a3eda] StringManipulation v0.3.4 [3783bdb8] TableTraits v1.0.1 [bd369af6] Tables v1.11.1 [a759f4b9] TimerOutputs v0.5.24 [013be700] UnsafeAtomics v0.2.1 [d80eeb9a] UnsafeAtomicsLLVM v0.1.5 [4ee394cb] CUDA_Driver_jll v0.9.0+0 [76a88914] CUDA_Runtime_jll v0.14.0+1 [9c1d0b0a] JuliaNVTXCallbacks_jll v0.2.1+0 ⌅ [dad2f222] LLVMExtra_jll v0.0.29+0 [e98f9f5b] NVTX_jll v3.1.0+2 [0dad84c5] ArgTools v1.1.1 [56f22d72] Artifacts [2a0f44e3] Base64 [ade2ca70] Dates [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching [9fa8497b] Future [b77e0a4c] InteractiveUtils [4af54fe1] LazyArtifacts [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 [8f399da3] Libdl [37e2e46d] LinearAlgebra [56ddb016] Logging [d6f4376e] Markdown [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.10.0 [de0858da] Printf [3fa0cd96] REPL [9a3f8284] Random [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization [6462fe0b] Sockets [2f01184e] SparseArrays v1.10.0 [10745b16] Statistics v1.10.0 [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test [cf7118a7] UUIDs [4ec0a83e] Unicode [e66e0078] CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] LibCURL_jll v8.4.0+0 [e37daf67] LibGit2_jll v1.6.4+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.2+1 [14a3606d] MozillaCACerts_jll v2023.1.10 [4536629a] OpenBLAS_jll v0.3.23+4 [bea87d4a] SuiteSparse_jll v7.2.1+1 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.8.0+1 [8e850ede] nghttp2_jll v1.52.0+1 [3f19e933] p7zip_jll v17.4.0+2 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m`
What happens when you load AtomixCUDA
?
The error still occurs after adding and loading AtomixCUDA
.
(jl_OV6Zim) pkg> st
Status `/tmp/jl_OV6Zim/Project.toml`
[a9b6321e] Atomix v0.1.0
[6171a885] AtomixCUDA v0.1.0-DEV `https://github.com/JuliaConcurrent/Atomix.jl#main:lib/AtomixCUDA`
[052768ef] CUDA v5.4.2
[63c18a36] KernelAbstractions v0.9.22
julia> using AtomixCUDA
I won't be able to look at this in detail until August.
For now I would recommend just writing a CUDA.jl kernel and using CUDA.@atomic
With an FP16 input, the example
throws an error.
It works fine on FP32 inputs.
Julia and package version: