JuliaGPU / oneAPI.jl

Julia support for the oneAPI programming toolkit.
https://juliagpu.org/oneapi/
Other
179 stars 21 forks source link

Infinite partial_mapreduce_device recursion #426

Closed michel2323 closed 4 months ago

michel2323 commented 4 months ago

When running

julia --project -e 'using oneAPI; sum(oneAPI.zeros(30000)) ; sum(oneAPI.zeros(2182))'

the reduction for the 30,000 case runs fine. However, for the 2182 it crashes with an InvalidIRError.

I then added some debug output into https://github.com/JuliaGPU/oneAPI.jl/blob/7973a8980f30bd1b543d1df8a95d099cfbb78c71/src/mapreduce.jl#L164-L178 .

It seems the recursion for the 2182 is infinite. I would suspect it to be shallower than for the 30000 case. I attached a log with some debug output. Maybe there is something obvious to you @maleadt .

out.log

maleadt commented 4 months ago

Looks like Level Zero's zeKernelSuggestGroupSize doesn't like prime-sized global sizes:

julia> k = @oneapi launch=false identity(nothing)

julia> oneL0.suggest_groupsize(k.fun, 521)
oneAPI.oneL0.ZeDim3(1, 1, 1)

julia> oneL0.suggest_groupsize(k.fun, 7877)
oneAPI.oneL0.ZeDim3(1, 1, 1)

julia> oneL0.suggest_groupsize(k.fun, 7919)
oneAPI.oneL0.ZeDim3(1, 1, 1)

These are really bad launch configurations... Maybe I'm misinterpreting the API?

EDIT: it seems to suggest really bad configurations for non-prime inputs too:

julia> oneL0.suggest_groupsize(k.fun, 8000)
oneAPI.oneL0.ZeDim3(64, 1, 1)

julia> oneL0.suggest_groupsize(k.fun, 512)
oneAPI.oneL0.ZeDim3(512, 1, 1)