`trimatmul` Not Supported

It seem like GPUArrays does not know how to handle a triangular matmul.

We should either add generic_trimatmul to GPUArrays or teach it to fall back to a normal matmul for those cases.

Falling back to matmul is faster in most cases than implementing a generic_trimatmul kernel since matmul often has a matrix matrix multiplication implementation, that is more performant than generic_trimatmul.

julia> A = UpperTriangular(MtlMatrix(rand(Float32, 1024, 1024)))
julia> x = mtl(rand(1024))
julia> A * x
ERROR: ArgumentError: cannot take the CPU address of a MtlMatrix{Float32, Private}
Stacktrace:
 [1] unsafe_convert(::Type{Ptr{Float32}}, x::MtlMatrix{Float32, Private})
   @ Metal ~/Developer/Metal.jl/src/array.jl:197
 [2] trmv!(uplo::Char, trans::Char, diag::Char, A::MtlMatrix{Float32, Private}, x::MtlVector{Float32, Private})
   @ LinearAlgebra.BLAS ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/blas.jl:1315
 [3] generic_trimatmul!(c::MtlVector{…}, uploc::Char, isunitc::Char, tfun::Function, A::MtlMatrix{…}, b::MtlVector{…})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:823
 [4] _trimul!(C::MtlVector{Float32, Private}, A::UpperTriangular{Float32, MtlMatrix{…}}, B::MtlVector{Float32, Private})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:705
 [5] mul!(C::MtlVector{Float32, Private}, A::UpperTriangular{Float32, MtlMatrix{…}}, B::MtlVector{Float32, Private})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:690
 [6] *(A::UpperTriangular{Float32, MtlMatrix{Float32, Private}}, B::MtlVector{Float32, Private})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/triangular.jl:1471
 [7] top-level scope
   @ REPL[18]:1
Some type information was truncated. Use `show(err)` to see complete types.

JuliaGPU / GPUArrays.jl

`trimatmul` Not Supported #534