Open maleadt opened 2 years ago
About M1 hardware: most of Metal.jl already works on Intel hardware, ~and if you add the correct versions mapping the kernel to the macOS version it will run on Intel as well. I've toyed with this a few weeks ago and it was launching correctly.~ EDIT: I've just now noticed that you already fixed that. I just tried using the last branch and it works fine as long as you don't generate kernels.
The main difference is that on M1 there's only shared memory, so you don't need to synchronise buffers with MtStorageModeShared
while on Intel you need to if you have a discrete GPU. I don't know if the code to synchronise upon copy/MtlArray creation is still there or if you dropped it.
Also, on Intel it will be wasteful to default to MtStorageModeShared
.
I'd be happy to support Intel or AMD hardware, but I just don't have the hardware (for CI and development) or time. So nothing against it, feel free to pick up that part.
FYI: On an Intel macOS Monterey (12.X), by changing MT_API_AVAILABLE
to allow mt_macos(12.0)
for mtBufferGPUAddress
, I can run this sample code with no crashes, and change the GPU as well.
julia> using Metal
julia> Metal.versioninfo()
macOS 12.3.1, Darwin 21.4.0
Toolchain:
- Julia: 1.8.0-rc1
- LLVM: 13.0.1
2 devices:
- Intel(R) UHD Graphics 630 (6.488 MiB allocated)
- AMD Radeon Pro 555X (0 bytes allocated)
julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
1
julia> a .+ 1
1-element MtlArray{Int64, 1}:
2
julia> device(a)
MtlDevice:
name: Intel(R) UHD Graphics 630
lowpower: false
headless: true
removable: false
unified memory: true
registry id: 4294969016
transfer rate: 0
julia> task_local_storage()[:MtlDevice] = MtlDevice(2)
MtlDevice:
name: AMD Radeon Pro 555X
lowpower: false
headless: false
removable: false
unified memory: false
registry id: 4294969102
transfer rate: 0
julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
1
julia> a .+ 1
1-element MtlArray{Int64, 1}:
0
julia> device(a.+1)
MtlDevice:
name: AMD Radeon Pro 555X
lowpower: false
headless: false
removable: false
unified memory: false
registry id: 4294969102
transfer rate: 0
On an Intel macOS Monterey (12.X), by changing
MT_API_AVAILABLE
to allowmt_macos(12.0)
formtBufferGPUAddress
I don't think you even need to do that, it compiles fine on Monterey here (with the availability macro just generating a warning). If I'm understanding ObjC correctly here, this means we're successfully accessing an undocumented property. I also noticed it works correctly, so on #master we are using that, see https://github.com/JuliaGPU/Metal.jl/pull/23#issuecomment-1154847790=, so I added a note to the README.
We now have cmt built on Yggdrasil, and the LLVM back-end supports LLVM 14 (Julia 1.9), so I've updated the README and the issue here.
I tried Metal.jl on an Intel iMac with an AMD Radeon Pro 5700 XT
% ./usr/bin/julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.0-rc1 (2022-05-27)
_/ |\__'_|_|_|\__'_| |
|__/ |
julia> import Pkg; Pkg.add("Metal")
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Installed GPUArrays ──────────── v8.4.0
Installed Metal_LLVM_Tools_jll ─ v0.3.0+1
Installed cmt_jll ────────────── v0.1.0+0
Installed GPUArraysCore ──────── v0.1.0
Installed CEnum ──────────────── v0.4.2
Installed LLVMExtra_jll ──────── v0.0.16+0
Installed GPUCompiler ────────── v0.16.1
Installed Metal ──────────────── v0.1.0
Installed LLVM ───────────────── v4.14.0
Downloaded artifact: Metal_LLVM_Tools
Downloaded artifact: LLVMExtra
Downloaded artifact: cmt
Updating `~/.julia/environments/v1.8/Project.toml`
[dde4c033] + Metal v0.1.0
Updating `~/.julia/environments/v1.8/Manifest.toml`
[79e6a3ab] + Adapt v3.3.3
[fa961155] + CEnum v0.4.2
[e2ba6199] + ExprTools v0.1.8
[0c68f7d7] + GPUArrays v8.4.0
[46192b85] + GPUArraysCore v0.1.0
[61eb1bfa] + GPUCompiler v0.16.1
[692b3bcd] + JLLWrappers v1.4.1
[929cbde3] + LLVM v4.14.0
[dde4c033] + Metal v0.1.0
[21216c6a] + Preferences v1.3.0
[189a3867] + Reexport v1.2.2
[a759f4b9] + TimerOutputs v0.5.20
[dad2f222] + LLVMExtra_jll v0.0.16+0
[0418c028] + Metal_LLVM_Tools_jll v0.3.0+1
[65323cdd] + cmt_jll v0.1.0+0
[0dad84c5] + ArgTools v1.1.1
[56f22d72] + Artifacts
[2a0f44e3] + Base64
[ade2ca70] + Dates
[f43a241f] + Downloads v1.6.0
[7b1f6079] + FileWatching
[b77e0a4c] + InteractiveUtils
[4af54fe1] + LazyArtifacts
[b27032c2] + LibCURL v0.6.3
[76f85450] + LibGit2
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[ca575930] + NetworkOptions v1.2.0
[44cfe95a] + Pkg v1.8.0
[de0858da] + Printf
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA v0.7.0
[9e88b42a] + Serialization
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[fa267f1f] + TOML v1.0.0
[a4e569a6] + Tar v1.10.0
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
[e66e0078] + CompilerSupportLibraries_jll v0.5.2+0
[deac9b47] + LibCURL_jll v7.81.0+0
[29816b5a] + LibSSH2_jll v1.10.2+0
[c8ffd9c3] + MbedTLS_jll v2.28.0+0
[14a3606d] + MozillaCACerts_jll v2022.2.1
[4536629a] + OpenBLAS_jll v0.3.20+0
[83775a58] + Zlib_jll v1.2.12+3
[8e850b90] + libblastrampoline_jll v5.1.0+0
[8e850ede] + nghttp2_jll v1.41.0+1
[3f19e933] + p7zip_jll v17.4.0+0
Precompiling project...
21 dependencies successfully precompiled in 12 seconds
julia> Metal.versioninfo()
ERROR: UndefVarError: Metal not defined
Stacktrace:
[1] top-level scope
@ REPL[2]:1
julia> using Metal
julia> Metal.versioninfo()
macOS 12.4.0, Darwin 21.5.0
Toolchain:
- Julia: 1.8.0-rc1
- LLVM: 13.0.1
1 device:
- AMD Radeon Pro 5700 XT (0 bytes allocated)
julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
1
julia> a .+ 1
┌ Warning: Compilation of MetalLib to native code failed.
│ If you think this is a bug, please file an issue and attach /var/folders/3n/56fpv14n4wj0c1l1sb106pzw0000gn/T/jl_OUC1h1KIc6.metallib.
└ @ Metal ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:178
ERROR: MtlError: Compiler encountered an internal error (code 2, CompilerError)
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/Metal/fQowO/lib/core/helpers.jl:68 [inlined]
[2] MtlComputePipelineState(d::MtlDevice, f::MtlFunction)
@ Metal.MTL ~/.julia/packages/Metal/fQowO/lib/core/compute_pipeline.jl:25
[3] mtlfunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry), Tuple{Vector{UInt8}, String}})
@ Metal ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:172
[4] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(Metal.mtlfunction_compile), linker::typeof(Metal.mtlfunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/iaKrd/src/cache.jl:95
[5] mtlfunction(f::GPUArrays.var"#broadcast_kernel#15", tt::Type{Tuple{Metal.mtlKernelContext, MtlDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}, Int64}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Metal ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:143
[6] mtlfunction
@ ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:136 [inlined]
[7] macro expansion
@ ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:64 [inlined]
[8] #launch_heuristic#53
@ ~/.julia/packages/Metal/fQowO/src/gpuarrays.jl:14 [inlined]
[9] _copyto!
@ ~/.julia/packages/GPUArrays/EVTem/src/host/broadcast.jl:73 [inlined]
[10] copyto!
@ ~/.julia/packages/GPUArrays/EVTem/src/host/broadcast.jl:56 [inlined]
[11] copy
@ ~/.julia/packages/GPUArrays/EVTem/src/host/broadcast.jl:47 [inlined]
[12] materialize(bc::Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Nothing, typeof(+), Tuple{MtlArray{Int64, 1}, Int64}})
@ Base.Broadcast ./broadcast.jl:860
[13] top-level scope
@ REPL[6]:1
[14] top-level scope
@ ~/.julia/packages/Metal/fQowO/src/initialization.jl:25
julia> device(a)
MtlDevice:
name: AMD Radeon Pro 5700 XT
lowpower: false
headless: false
removable: false
unified memory: false
registry id: 4294968934
transfer rate: 0
julia> task_local_storage()[:MtlDevice] = MtlDevice(1)
MtlDevice:
name: AMD Radeon Pro 5700 XT
lowpower: false
headless: false
removable: false
unified memory: false
registry id: 4294968934
transfer rate: 0
Given how fast these toolchains are moving, I would recommend making 1.9 the min Julia version, and quickly adopting 1.10 as the min when it is out.
Metal.jl currently requires:
If people are interested in working on this, some of these can be relaxed:
metal_release_13
andmetal_release_14
branches, they'd need to be applied on top ofllvm_release_12
for 1.7 compatibility