JuliaGPU / Metal.jl

Metal programming in Julia
MIT License
346 stars 36 forks source link

Relax package requirements #22

Open maleadt opened 2 years ago

maleadt commented 2 years ago

Metal.jl currently requires:

If people are interested in working on this, some of these can be relaxed:

PhilipVinc commented 2 years ago

About M1 hardware: most of Metal.jl already works on Intel hardware, ~and if you add the correct versions mapping the kernel to the macOS version it will run on Intel as well. I've toyed with this a few weeks ago and it was launching correctly.~ EDIT: I've just now noticed that you already fixed that. I just tried using the last branch and it works fine as long as you don't generate kernels.

The main difference is that on M1 there's only shared memory, so you don't need to synchronise buffers with MtStorageModeShared while on Intel you need to if you have a discrete GPU. I don't know if the code to synchronise upon copy/MtlArray creation is still there or if you dropped it.

Also, on Intel it will be wasteful to default to MtStorageModeShared.

maleadt commented 2 years ago

I'd be happy to support Intel or AMD hardware, but I just don't have the hardware (for CI and development) or time. So nothing against it, feel free to pick up that part.

PhilipVinc commented 2 years ago

FYI: On an Intel macOS Monterey (12.X), by changing MT_API_AVAILABLE to allow mt_macos(12.0) for mtBufferGPUAddress, I can run this sample code with no crashes, and change the GPU as well.

julia> using Metal

julia> Metal.versioninfo()
macOS 12.3.1, Darwin 21.4.0

Toolchain:
- Julia: 1.8.0-rc1
- LLVM: 13.0.1

2 devices:
- Intel(R) UHD Graphics 630 (6.488 MiB allocated)
- AMD Radeon Pro 555X (0 bytes allocated)

julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
 1

julia> a .+ 1
1-element MtlArray{Int64, 1}:
 2

julia> device(a)
MtlDevice:
 name:             Intel(R) UHD Graphics 630
 lowpower:         false
 headless:         true
 removable:        false
 unified memory:   true
 registry id:      4294969016
 transfer rate:    0

julia> task_local_storage()[:MtlDevice] = MtlDevice(2)
MtlDevice:
 name:             AMD Radeon Pro 555X
 lowpower:         false
 headless:         false
 removable:        false
 unified memory:   false
 registry id:      4294969102
 transfer rate:    0

julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
 1

julia> a .+ 1
1-element MtlArray{Int64, 1}:
 0

julia> device(a.+1)
MtlDevice:
 name:             AMD Radeon Pro 555X
 lowpower:         false
 headless:         false
 removable:        false
 unified memory:   false
 registry id:      4294969102
 transfer rate:    0
maleadt commented 2 years ago

On an Intel macOS Monterey (12.X), by changing MT_API_AVAILABLE to allow mt_macos(12.0) for mtBufferGPUAddress

I don't think you even need to do that, it compiles fine on Monterey here (with the availability macro just generating a warning). If I'm understanding ObjC correctly here, this means we're successfully accessing an undocumented property. I also noticed it works correctly, so on #master we are using that, see https://github.com/JuliaGPU/Metal.jl/pull/23#issuecomment-1154847790=, so I added a note to the README.

maleadt commented 2 years ago

We now have cmt built on Yggdrasil, and the LLVM back-end supports LLVM 14 (Julia 1.9), so I've updated the README and the issue here.

dbl001 commented 2 years ago

I tried Metal.jl on an Intel iMac with an AMD Radeon Pro 5700 XT

% ./usr/bin/julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> import Pkg; Pkg.add("Metal")
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
   Installed GPUArrays ──────────── v8.4.0
   Installed Metal_LLVM_Tools_jll ─ v0.3.0+1
   Installed cmt_jll ────────────── v0.1.0+0
   Installed GPUArraysCore ──────── v0.1.0
   Installed CEnum ──────────────── v0.4.2
   Installed LLVMExtra_jll ──────── v0.0.16+0
   Installed GPUCompiler ────────── v0.16.1
   Installed Metal ──────────────── v0.1.0
   Installed LLVM ───────────────── v4.14.0
  Downloaded artifact: Metal_LLVM_Tools
  Downloaded artifact: LLVMExtra
  Downloaded artifact: cmt
    Updating `~/.julia/environments/v1.8/Project.toml`
  [dde4c033] + Metal v0.1.0
    Updating `~/.julia/environments/v1.8/Manifest.toml`
  [79e6a3ab] + Adapt v3.3.3
  [fa961155] + CEnum v0.4.2
  [e2ba6199] + ExprTools v0.1.8
  [0c68f7d7] + GPUArrays v8.4.0
  [46192b85] + GPUArraysCore v0.1.0
  [61eb1bfa] + GPUCompiler v0.16.1
  [692b3bcd] + JLLWrappers v1.4.1
  [929cbde3] + LLVM v4.14.0
  [dde4c033] + Metal v0.1.0
  [21216c6a] + Preferences v1.3.0
  [189a3867] + Reexport v1.2.2
  [a759f4b9] + TimerOutputs v0.5.20
  [dad2f222] + LLVMExtra_jll v0.0.16+0
  [0418c028] + Metal_LLVM_Tools_jll v0.3.0+1
  [65323cdd] + cmt_jll v0.1.0+0
  [0dad84c5] + ArgTools v1.1.1
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads v1.6.0
  [7b1f6079] + FileWatching
  [b77e0a4c] + InteractiveUtils
  [4af54fe1] + LazyArtifacts
  [b27032c2] + LibCURL v0.6.3
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [ca575930] + NetworkOptions v1.2.0
  [44cfe95a] + Pkg v1.8.0
  [de0858da] + Printf
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA v0.7.0
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [fa267f1f] + TOML v1.0.0
  [a4e569a6] + Tar v1.10.0
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [e66e0078] + CompilerSupportLibraries_jll v0.5.2+0
  [deac9b47] + LibCURL_jll v7.81.0+0
  [29816b5a] + LibSSH2_jll v1.10.2+0
  [c8ffd9c3] + MbedTLS_jll v2.28.0+0
  [14a3606d] + MozillaCACerts_jll v2022.2.1
  [4536629a] + OpenBLAS_jll v0.3.20+0
  [83775a58] + Zlib_jll v1.2.12+3
  [8e850b90] + libblastrampoline_jll v5.1.0+0
  [8e850ede] + nghttp2_jll v1.41.0+1
  [3f19e933] + p7zip_jll v17.4.0+0
Precompiling project...
  21 dependencies successfully precompiled in 12 seconds

julia> Metal.versioninfo()
ERROR: UndefVarError: Metal not defined
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1

julia> using Metal

julia> Metal.versioninfo()
macOS 12.4.0, Darwin 21.5.0

Toolchain:
- Julia: 1.8.0-rc1
- LLVM: 13.0.1

1 device:
- AMD Radeon Pro 5700 XT (0 bytes allocated)

julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
 1

julia> a .+ 1
┌ Warning: Compilation of MetalLib to native code failed.
│ If you think this is a bug, please file an issue and attach /var/folders/3n/56fpv14n4wj0c1l1sb106pzw0000gn/T/jl_OUC1h1KIc6.metallib.
└ @ Metal ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:178
ERROR: MtlError: Compiler encountered an internal error (code 2, CompilerError)
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Metal/fQowO/lib/core/helpers.jl:68 [inlined]
  [2] MtlComputePipelineState(d::MtlDevice, f::MtlFunction)
    @ Metal.MTL ~/.julia/packages/Metal/fQowO/lib/core/compute_pipeline.jl:25
  [3] mtlfunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry), Tuple{Vector{UInt8}, String}})
    @ Metal ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:172
  [4] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(Metal.mtlfunction_compile), linker::typeof(Metal.mtlfunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/iaKrd/src/cache.jl:95
  [5] mtlfunction(f::GPUArrays.var"#broadcast_kernel#15", tt::Type{Tuple{Metal.mtlKernelContext, MtlDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}, Int64}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:143
  [6] mtlfunction
    @ ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:136 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/Metal/fQowO/src/compiler/execution.jl:64 [inlined]
  [8] #launch_heuristic#53
    @ ~/.julia/packages/Metal/fQowO/src/gpuarrays.jl:14 [inlined]
  [9] _copyto!
    @ ~/.julia/packages/GPUArrays/EVTem/src/host/broadcast.jl:73 [inlined]
 [10] copyto!
    @ ~/.julia/packages/GPUArrays/EVTem/src/host/broadcast.jl:56 [inlined]
 [11] copy
    @ ~/.julia/packages/GPUArrays/EVTem/src/host/broadcast.jl:47 [inlined]
 [12] materialize(bc::Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Nothing, typeof(+), Tuple{MtlArray{Int64, 1}, Int64}})
    @ Base.Broadcast ./broadcast.jl:860
 [13] top-level scope
    @ REPL[6]:1
 [14] top-level scope
    @ ~/.julia/packages/Metal/fQowO/src/initialization.jl:25

julia> device(a)
MtlDevice:
 name:             AMD Radeon Pro 5700 XT
 lowpower:         false
 headless:         false
 removable:        false
 unified memory:   false
 registry id:      4294968934
 transfer rate:    0

julia> task_local_storage()[:MtlDevice] = MtlDevice(1)
MtlDevice:
 name:             AMD Radeon Pro 5700 XT
 lowpower:         false
 headless:         false
 removable:        false
 unified memory:   false
 registry id:      4294968934
 transfer rate:    0

jl_OUC1h1KIc6.metallib.gz

ViralBShah commented 1 year ago

Given how fast these toolchains are moving, I would recommend making 1.9 the min Julia version, and quickly adopting 1.10 as the min when it is out.