JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.16k stars 206 forks source link

GEMM of non-contiguous inputs should dispatch to fallback implementation #2412

Closed albertomercurio closed 2 weeks ago

albertomercurio commented 3 weeks ago

Describe the bug

The in-place multiplication mul!(A, B, C) fails when A is a vector view of a matrix.

To reproduce

The Minimal Working Example (MWE) for this bug:

using LinearAlgebra
using CUDA
CUDA.allowscalar(false)

# Working case
prova1 = CUDA.rand(Int, 100) # Vector
prova2 = CUDA.rand(Int, 200, 200) # Matrix
prova3 = CUDA.rand(Int, 200) # Vector
mul!(@view(prova1[1:10]), transpose(@view(prova2[1:10, 1:10])), @view(prova3[1:10]))

# Not working case
prova1 = CUDA.rand(Int, 100, 100) # Matrix
prova2 = CUDA.rand(Int, 200, 200) # Matrix
prova3 = CUDA.rand(Int, 200) # Vector
mul!(@view(prova1[1, 1:10]), transpose(@view(prova2[1:10, 1:10])), @view(prova3[1:10]))

Surprisingly, it works for matrix views of matrices

# Working case
prova1 = CUDA.rand(Int, 100) # Vector
prova2 = CUDA.rand(Int, 200, 200) # Matrix
prova3 = CUDA.rand(Int, 200, 200) # Matrix
mul!(@view(prova1[1:10]), transpose(@view(prova2[1:10, 1:10])), @view(prova3[1, 1:10]))
Manifest.toml

``` # This file is machine-generated - editing it directly is not advised julia_version = "1.10.3" manifest_format = "2.0" project_hash = "4d6f52122ea9741175c8c71d00021be0a921f3ad" [[deps.AbstractFFTs]] deps = ["LinearAlgebra"] git-tree-sha1 = "d92ad398961a3ed262d8bf04a1a2b8340f915fef" uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c" version = "1.5.0" [deps.AbstractFFTs.extensions] AbstractFFTsChainRulesCoreExt = "ChainRulesCore" AbstractFFTsTestExt = "Test" [deps.AbstractFFTs.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[deps.Adapt]] deps = ["LinearAlgebra", "Requires"] git-tree-sha1 = "6a55b747d1812e699320963ffde36f1ebdda4099" uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" version = "4.0.4" weakdeps = ["StaticArrays"] [deps.Adapt.extensions] AdaptStaticArraysExt = "StaticArrays" [[deps.ArgTools]] uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f" version = "1.1.1" [[deps.Artifacts]] uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33" [[deps.Atomix]] deps = ["UnsafeAtomics"] git-tree-sha1 = "c06a868224ecba914baa6942988e2f2aade419be" uuid = "a9b6321e-bd34-4604-b9c9-b65b8de01458" version = "0.1.0" [[deps.BFloat16s]] deps = ["LinearAlgebra", "Printf", "Random", "Test"] git-tree-sha1 = "2c7cc21e8678eff479978a0a2ef5ce2f51b63dff" uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b" version = "0.5.0" [[deps.Base64]] uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f" [[deps.CEnum]] git-tree-sha1 = "389ad5c84de1ae7cf0e28e381131c98ea87d54fc" uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82" version = "0.5.0" [[deps.CUDA]] deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics"] git-tree-sha1 = "6e945e876652f2003e6ca74e19a3c45017d3e9f6" uuid = "052768ef-5323-5732-b1bb-66c8b64840ba" version = "5.4.2" [deps.CUDA.extensions] ChainRulesCoreExt = "ChainRulesCore" EnzymeCoreExt = "EnzymeCore" SpecialFunctionsExt = "SpecialFunctions" [deps.CUDA.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869" SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b" [[deps.CUDA_Driver_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"] git-tree-sha1 = "c48f9da18efd43b6b7adb7ee1f93fe5f2926c339" uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc" version = "0.9.0+0" [[deps.CUDA_Runtime_Discovery]] deps = ["Libdl"] git-tree-sha1 = "5db9da5fdeaa708c22ba86b82c49528f402497f2" uuid = "1af6417a-86b4-443c-805f-a4643ffb695f" version = "0.3.3" [[deps.CUDA_Runtime_jll]] deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "bcba305388e16aa5c879e896726db9e71b4942c6" uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2" version = "0.14.0+1" [[deps.ColorTypes]] deps = ["FixedPointNumbers", "Random"] git-tree-sha1 = "b10d0b65641d57b8b4d5e234446582de5047050d" uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f" version = "0.11.5" [[deps.Colors]] deps = ["ColorTypes", "FixedPointNumbers", "Reexport"] git-tree-sha1 = "362a287c3aa50601b0bc359053d5c2468f0e7ce0" uuid = "5ae59095-9a9b-59fe-a467-6f913c188581" version = "0.12.11" [[deps.Compat]] deps = ["TOML", "UUIDs"] git-tree-sha1 = "b1c55339b7c6c350ee89f2c1604299660525b248" uuid = "34da2185-b29b-5c13-b0c7-acf172513d20" version = "4.15.0" weakdeps = ["Dates", "LinearAlgebra"] [deps.Compat.extensions] CompatLinearAlgebraExt = "LinearAlgebra" [[deps.CompilerSupportLibraries_jll]] deps = ["Artifacts", "Libdl"] uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae" version = "1.1.1+0" [[deps.Crayons]] git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15" uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f" version = "4.1.1" [[deps.DataAPI]] git-tree-sha1 = "abe83f3a2f1b857aac70ef8b269080af17764bbe" uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a" version = "1.16.0" [[deps.DataFrames]] deps = ["Compat", "DataAPI", "DataStructures", "Future", "InlineStrings", "InvertedIndices", "IteratorInterfaceExtensions", "LinearAlgebra", "Markdown", "Missings", "PooledArrays", "PrecompileTools", "PrettyTables", "Printf", "REPL", "Random", "Reexport", "SentinelArrays", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"] git-tree-sha1 = "04c738083f29f86e62c8afc341f0967d8717bdb8" uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" version = "1.6.1" [[deps.DataStructures]] deps = ["Compat", "InteractiveUtils", "OrderedCollections"] git-tree-sha1 = "1d0a14036acb104d9e89698bd408f63ab58cdc82" uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8" version = "0.18.20" [[deps.DataValueInterfaces]] git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6" uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464" version = "1.0.0" [[deps.Dates]] deps = ["Printf"] uuid = "ade2ca70-3891-5945-98fb-dc099432e06a" [[deps.Downloads]] deps = ["ArgTools", "FileWatching", "LibCURL", "NetworkOptions"] uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6" version = "1.6.0" [[deps.ExprTools]] git-tree-sha1 = "27415f162e6028e81c72b82ef756bf321213b6ec" uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04" version = "0.1.10" [[deps.FileWatching]] uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee" [[deps.FixedPointNumbers]] deps = ["Statistics"] git-tree-sha1 = "05882d6995ae5c12bb5f36dd2ed3f61c98cbb172" uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93" version = "0.8.5" [[deps.Future]] deps = ["Random"] uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820" [[deps.GPUArrays]] deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"] git-tree-sha1 = "38cb19b8a3e600e509dc36a6396ac74266d108c1" uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7" version = "10.1.1" [[deps.GPUArraysCore]] deps = ["Adapt"] git-tree-sha1 = "ec632f177c0d990e64d955ccc1b8c04c485a0950" uuid = "46192b85-c4d5-4398-a991-12ede77f4527" version = "0.1.6" [[deps.GPUCompiler]] deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"] git-tree-sha1 = "518ebd058c9895de468a8c255797b0c53fdb44dd" uuid = "61eb1bfa-7361-4325-ad38-22787b887f55" version = "0.26.5" [[deps.InlineStrings]] deps = ["Parsers"] git-tree-sha1 = "9cc2baf75c6d09f9da536ddf58eb2f29dedaf461" uuid = "842dd82b-1e85-43dc-bf29-5d0ee9dffc48" version = "1.4.0" [[deps.InteractiveUtils]] deps = ["Markdown"] uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240" [[deps.InvertedIndices]] git-tree-sha1 = "0dc7b50b8d436461be01300fd8cd45aa0274b038" uuid = "41ab1584-1d38-5bbf-9106-f11c6c58b48f" version = "1.3.0" [[deps.IteratorInterfaceExtensions]] git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856" uuid = "82899510-4779-5014-852e-03e436cf321d" version = "1.0.0" [[deps.JLLWrappers]] deps = ["Artifacts", "Preferences"] git-tree-sha1 = "7e5d6779a1e09a36db2a7b6cff50942a0a7d0fca" uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210" version = "1.5.0" [[deps.JuliaNVTXCallbacks_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "af433a10f3942e882d3c671aacb203e006a5808f" uuid = "9c1d0b0a-7046-5b2e-a33f-ea22f176ac7e" version = "0.2.1+0" [[deps.KernelAbstractions]] deps = ["Adapt", "Atomix", "InteractiveUtils", "LinearAlgebra", "MacroTools", "PrecompileTools", "Requires", "SparseArrays", "StaticArrays", "UUIDs", "UnsafeAtomics", "UnsafeAtomicsLLVM"] git-tree-sha1 = "8e5a339882cc401688d79b811d923a38ba77d50a" uuid = "63c18a36-062a-441e-b654-da1e3ab1ce7c" version = "0.9.20" [deps.KernelAbstractions.extensions] EnzymeExt = "EnzymeCore" [deps.KernelAbstractions.weakdeps] EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869" [[deps.LLVM]] deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"] git-tree-sha1 = "389aea28d882a40b5e1747069af71bdbd47a1cae" uuid = "929cbde3-209d-540e-8aea-75f648917ca0" version = "7.2.1" weakdeps = ["BFloat16s"] [deps.LLVM.extensions] BFloat16sExt = "BFloat16s" [[deps.LLVMExtra_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "88b916503aac4fb7f701bb625cd84ca5dd1677bc" uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab" version = "0.0.29+0" [[deps.LLVMLoopInfo]] git-tree-sha1 = "2e5c102cfc41f48ae4740c7eca7743cc7e7b75ea" uuid = "8b046642-f1f6-4319-8d3c-209ddc03c586" version = "1.0.0" [[deps.LaTeXStrings]] git-tree-sha1 = "50901ebc375ed41dbf8058da26f9de442febbbec" uuid = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f" version = "1.3.1" [[deps.LazyArtifacts]] deps = ["Artifacts", "Pkg"] uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3" [[deps.LibCURL]] deps = ["LibCURL_jll", "MozillaCACerts_jll"] uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21" version = "0.6.4" [[deps.LibCURL_jll]] deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"] uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0" version = "8.4.0+0" [[deps.LibGit2]] deps = ["Base64", "LibGit2_jll", "NetworkOptions", "Printf", "SHA"] uuid = "76f85450-5226-5b5a-8eaa-529ad045b433" [[deps.LibGit2_jll]] deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll"] uuid = "e37daf67-58a4-590a-8e99-b0245dd2ffc5" version = "1.6.4+0" [[deps.LibSSH2_jll]] deps = ["Artifacts", "Libdl", "MbedTLS_jll"] uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8" version = "1.11.0+1" [[deps.Libdl]] uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb" [[deps.LinearAlgebra]] deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"] uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" [[deps.Logging]] uuid = "56ddb016-857b-54e1-b83d-db4d58db5568" [[deps.MacroTools]] deps = ["Markdown", "Random"] git-tree-sha1 = "2fa9ee3e63fd3a4f7a9a4f4744a52f4856de82df" uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09" version = "0.5.13" [[deps.Markdown]] deps = ["Base64"] uuid = "d6f4376e-aef5-505a-96c1-9c027394607a" [[deps.MbedTLS_jll]] deps = ["Artifacts", "Libdl"] uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1" version = "2.28.2+1" [[deps.Missings]] deps = ["DataAPI"] git-tree-sha1 = "ec4f7fbeab05d7747bdf98eb74d130a2a2ed298d" uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28" version = "1.2.0" [[deps.MozillaCACerts_jll]] uuid = "14a3606d-f60d-562e-9121-12d972cd8159" version = "2023.1.10" [[deps.NVTX]] deps = ["Colors", "JuliaNVTXCallbacks_jll", "Libdl", "NVTX_jll"] git-tree-sha1 = "53046f0483375e3ed78e49190f1154fa0a4083a1" uuid = "5da4648a-3479-48b8-97b9-01cb529c0a1f" version = "0.3.4" [[deps.NVTX_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "ce3269ed42816bf18d500c9f63418d4b0d9f5a3b" uuid = "e98f9f5b-d649-5603-91fd-7774390e6439" version = "3.1.0+2" [[deps.NetworkOptions]] uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908" version = "1.2.0" [[deps.OpenBLAS_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"] uuid = "4536629a-c528-5b80-bd46-f80d51c5b363" version = "0.3.23+4" [[deps.OrderedCollections]] git-tree-sha1 = "dfdf5519f235516220579f949664f1bf44e741c5" uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d" version = "1.6.3" [[deps.Parsers]] deps = ["Dates", "PrecompileTools", "UUIDs"] git-tree-sha1 = "8489905bcdbcfac64d1daa51ca07c0d8f0283821" uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0" version = "2.8.1" [[deps.Pkg]] deps = ["Artifacts", "Dates", "Downloads", "FileWatching", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"] uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" version = "1.10.0" [[deps.PooledArrays]] deps = ["DataAPI", "Future"] git-tree-sha1 = "36d8b4b899628fb92c2749eb488d884a926614d3" uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720" version = "1.4.3" [[deps.PrecompileTools]] deps = ["Preferences"] git-tree-sha1 = "5aa36f7049a63a1528fe8f7c3f2113413ffd4e1f" uuid = "aea7be01-6a6a-4083-8856-8a6e6704d82a" version = "1.2.1" [[deps.Preferences]] deps = ["TOML"] git-tree-sha1 = "9306f6085165d270f7e3db02af26a400d580f5c6" uuid = "21216c6a-2e73-6563-6e65-726566657250" version = "1.4.3" [[deps.PrettyTables]] deps = ["Crayons", "LaTeXStrings", "Markdown", "PrecompileTools", "Printf", "Reexport", "StringManipulation", "Tables"] git-tree-sha1 = "66b20dd35966a748321d3b2537c4584cf40387c7" uuid = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d" version = "2.3.2" [[deps.Printf]] deps = ["Unicode"] uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7" [[deps.REPL]] deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"] uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb" [[deps.Random]] deps = ["SHA"] uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" [[deps.Random123]] deps = ["Random", "RandomNumbers"] git-tree-sha1 = "4743b43e5a9c4a2ede372de7061eed81795b12e7" uuid = "74087812-796a-5b5d-8853-05524746bad3" version = "1.7.0" [[deps.RandomNumbers]] deps = ["Random", "Requires"] git-tree-sha1 = "043da614cc7e95c703498a491e2c21f58a2b8111" uuid = "e6cf234a-135c-5ec9-84dd-332b85af5143" version = "1.5.3" [[deps.Reexport]] git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b" uuid = "189a3867-3050-52da-a836-e630ba90ab69" version = "1.2.2" [[deps.Requires]] deps = ["UUIDs"] git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7" uuid = "ae029012-a4dd-5104-9daa-d747884805df" version = "1.3.0" [[deps.SHA]] uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce" version = "0.7.0" [[deps.Scratch]] deps = ["Dates"] git-tree-sha1 = "3bac05bc7e74a75fd9cba4295cde4045d9fe2386" uuid = "6c6a2e73-6563-6170-7368-637461726353" version = "1.2.1" [[deps.SentinelArrays]] deps = ["Dates", "Random"] git-tree-sha1 = "90b4f68892337554d31cdcdbe19e48989f26c7e6" uuid = "91c51154-3ec4-41a3-a24f-3f23e20d615c" version = "1.4.3" [[deps.Serialization]] uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b" [[deps.Sockets]] uuid = "6462fe0b-24de-5631-8697-dd941f90decc" [[deps.SortingAlgorithms]] deps = ["DataStructures"] git-tree-sha1 = "66e0a8e672a0bdfca2c3f5937efb8538b9ddc085" uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c" version = "1.2.1" [[deps.SparseArrays]] deps = ["Libdl", "LinearAlgebra", "Random", "Serialization", "SuiteSparse_jll"] uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" version = "1.10.0" [[deps.StaticArrays]] deps = ["LinearAlgebra", "PrecompileTools", "Random", "StaticArraysCore"] git-tree-sha1 = "6e00379a24597be4ae1ee6b2d882e15392040132" uuid = "90137ffa-7385-5640-81b9-e52037218182" version = "1.9.5" [deps.StaticArrays.extensions] StaticArraysChainRulesCoreExt = "ChainRulesCore" StaticArraysStatisticsExt = "Statistics" [deps.StaticArrays.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" [[deps.StaticArraysCore]] git-tree-sha1 = "192954ef1208c7019899fbf8049e717f92959682" uuid = "1e83bf80-4336-4d27-bf5d-d5a4f845583c" version = "1.4.3" [[deps.Statistics]] deps = ["LinearAlgebra", "SparseArrays"] uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" version = "1.10.0" [[deps.StringManipulation]] deps = ["PrecompileTools"] git-tree-sha1 = "a04cabe79c5f01f4d723cc6704070ada0b9d46d5" uuid = "892a3eda-7b42-436c-8928-eab12a02cf0e" version = "0.3.4" [[deps.SuiteSparse_jll]] deps = ["Artifacts", "Libdl", "libblastrampoline_jll"] uuid = "bea87d4a-7f5b-5778-9afe-8cc45184846c" version = "7.2.1+1" [[deps.TOML]] deps = ["Dates"] uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76" version = "1.0.3" [[deps.TableTraits]] deps = ["IteratorInterfaceExtensions"] git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39" uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c" version = "1.0.1" [[deps.Tables]] deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "OrderedCollections", "TableTraits"] git-tree-sha1 = "cb76cf677714c095e535e3501ac7954732aeea2d" uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c" version = "1.11.1" [[deps.Tar]] deps = ["ArgTools", "SHA"] uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e" version = "1.10.0" [[deps.Test]] deps = ["InteractiveUtils", "Logging", "Random", "Serialization"] uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[deps.TimerOutputs]] deps = ["ExprTools", "Printf"] git-tree-sha1 = "5a13ae8a41237cff5ecf34f73eb1b8f42fff6531" uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f" version = "0.5.24" [[deps.UUIDs]] deps = ["Random", "SHA"] uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4" [[deps.Unicode]] uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5" [[deps.UnsafeAtomics]] git-tree-sha1 = "6331ac3440856ea1988316b46045303bef658278" uuid = "013be700-e6cd-48c3-b4a1-df204f14c38f" version = "0.2.1" [[deps.UnsafeAtomicsLLVM]] deps = ["LLVM", "UnsafeAtomics"] git-tree-sha1 = "d9f5962fecd5ccece07db1ff006fb0b5271bdfdd" uuid = "d80eeb9a-aca5-4d75-85e5-170c8b632249" version = "0.1.4" [[deps.Zlib_jll]] deps = ["Libdl"] uuid = "83775a58-1f1d-513f-b197-d71354ab007a" version = "1.2.13+1" [[deps.libblastrampoline_jll]] deps = ["Artifacts", "Libdl"] uuid = "8e850b90-86db-534c-a0d3-1478176c7d93" version = "5.8.0+1" [[deps.nghttp2_jll]] deps = ["Artifacts", "Libdl"] uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d" version = "1.52.0+1" [[deps.p7zip_jll]] deps = ["Artifacts", "Libdl"] uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0" version = "17.4.0+2" ```

Expected behavior

No scalar indexing error.

Version info

Details on Julia:

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900KF
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 32 default, 0 interactive, 16 GC (on 32 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 32

Details on CUDA:

CUDA runtime 12.5, artifact installation
CUDA driver 12.3
NVIDIA driver 545.29.6

CUDA libraries: 
- CUBLAS: 12.5.2
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.2
- CUSPARSE: 12.4.1
- CUPTI: 23.0.0
- NVML: 12.0.0+545.29.6

Julia packages: 
- CUDA: 5.4.2
- CUDA_Driver_jll: 0.9.0+0
- CUDA_Runtime_jll: 0.14.0+1

Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 4090 (sm_89, 20.577 GiB / 23.988 GiB available)
maleadt commented 3 weeks ago

To debug this, I would step through the execution and figure out why this doesn't dispatch to either the generic GPUArrays' version, or the optimized CUBLAS ones, both of which are invoked from here: https://github.com/JuliaGPU/CUDA.jl/blob/e1e5be2b6bf17f03a367cebeb18c4645e593f80d/lib/cublas/linalg.jl#L285-L330

albertomercurio commented 2 weeks ago

Ok, the problem is that the view is not a CuVector, but it is an AbstractVector

@view(prova1[1, 1:10]) # 10-element view(::CuArray{Int64, 2, CUDA.DeviceMemory}, 1, 1:10) with eltype Int64
@view(prova1[1, 1:10]) isa AbstractVector # true
@view(prova1[1, 1:10]) isa CuVector # false

Is this because the indices [1, 1:10] creates a view of a non-strided vector?

Moreover, I noticed that the view of a CuVector remains a CuVector instead of a view(::CuArray{Int64, 1, CUDA.DeviceMemory}, 1:10) for example


my_vec = CUDA.rand(Int, 100)

@view(my_vec[1:10]) # 10-element CuArray{Int64, 1, CUDA.DeviceMemory}

Following this, the view(prova1[1:10, 1]) with inverse indices generates a CuVector, and the method works.

I also noticed that, making a view of the vector using a vector of indices, instead of a UnitRange, then I get a view.

maleadt commented 2 weeks ago

I don't think that's the problem. You have to check against CuStridedVecOrMat, which includes certain SubArrays. I'm not too familiar with how LinearAlgebra unwraps the wrappers here, but some is not correctly getting unwrapped here, resulting in dispatch falling back to the Base implementation. Maybe @dkarrasch has a quick thought?

dkarrasch commented 2 weeks ago

One would need the stacktrace of the non-working cases, to see the call path. Then we'd get an idea which dispatch is failing. And yes, CuStridedVecOrMat is the thing to check.

maleadt commented 2 weeks ago
julia> mul!(@view(prova1[1, 1:10]), transpose(@view(prova2[1:10, 1:10])), @view(prova3[1:10]))
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/Julia/pkg/GPUArrays/lib/GPUArraysCore/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/Julia/pkg/GPUArrays/lib/GPUArraysCore/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/Julia/pkg/GPUArrays/lib/GPUArraysCore/src/GPUArraysCore.jl:116
  [5] getindex(A::CuArray{Int64, 2, CUDA.DeviceMemory}, I::Int64)
    @ GPUArrays ~/Julia/pkg/GPUArrays/src/host/indexing.jl:48
  [6] scalar_getindex
    @ ~/Julia/pkg/GPUArrays/src/host/indexing.jl:34 [inlined]
  [7] _getindex
    @ ~/Julia/pkg/GPUArrays/src/host/indexing.jl:17 [inlined]
  [8] getindex
    @ ~/Julia/pkg/GPUArrays/src/host/indexing.jl:15 [inlined]
  [9] getindex
    @ ./subarray.jl:290 [inlined]
 [10] _getindex
    @ ./abstractarray.jl:1336 [inlined]
 [11] getindex
    @ ./abstractarray.jl:1291 [inlined]
 [12] _generic_matvecmul!(C::SubArray{Int64, 1, CuArray{Int64, 2, CUDA.DeviceMemory}, Tuple{Int64, UnitRange{Int64}}, true}, tA::Char, A::SubArray{Int64, 2, CuArray{Int64, 2, CUDA.DeviceMemory}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, B::CuArray{Int64, 1, CUDA.DeviceMemory}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:714
 [13] generic_matvecmul!
    @ ~/Julia/depot/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:687 [inlined]
 [14] mul!
    @ ~/Julia/depot/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:66 [inlined]
 [15] mul!(C::SubArray{Int64, 1, CuArray{Int64, 2, CUDA.DeviceMemory}, Tuple{Int64, UnitRange{Int64}}, true}, A::Transpose{Int64, SubArray{Int64, 2, CuArray{Int64, 2, CUDA.DeviceMemory}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}, B::CuArray{Int64, 1, CUDA.DeviceMemory})
    @ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237
 [16] top-level scope
    @ REPL[61]:1
dkarrasch commented 2 weeks ago

So, how does this work with Int arrays? I guess CuBlas is not an option, so which eltype restrictions (if any) are there over at GPUArrays.jl? Looks like step [13] (or [12] at the latest) should be caught by some GPU package.

maleadt commented 2 weeks ago

GPUArrays.jl' GEMM has no eltype restrictions: https://github.com/JuliaGPU/GPUArrays.jl/blob/3f8734280311e0954318ccc118b28a179bac6fff/src/host/linalg.jl#L342-L373

dkarrasch commented 2 weeks ago

Aha, and the ones in here

https://github.com/search?q=repo%3AJuliaGPU%2FCUDA.jl%20generic_matvecmul!&type=code

don't apply because the first argument is a view, and not a straight CuVector.

dkarrasch commented 2 weeks ago

That signature can be relaxed. At least, the wrappers (gemv!, symv! and hemv!) in cublas/wrappers.jl allow for destination objects of type StridedCuVector.

dkarrasch commented 2 weeks ago

GPUArrays.jl' GEMM has no eltype restrictions: https://github.com/JuliaGPU/GPUArrays.jl/blob/3f8734280311e0954318ccc118b28a179bac6fff/src/host/linalg.jl#L342-L373

BTW, that method signature is so generic, I wonder if it will be chosen here: https://github.com/JuliaGPU/CUDA.jl/blob/e1e5be2b6bf17f03a367cebeb18c4645e593f80d/lib/cublas/linalg.jl#L210.