Open sgaure opened 8 months ago
This is probably caused by 5b2fcb68800 and is not multi-threading related:
julia> const dynvec = ScopedValue([0])
julia> @noinline function dynfun()
dvec = dynvec[]
dvec[1] += 1
return nothing
end
julia> foo() = @with dynvec=>[0] for _ in 1:1000_000; dynfun(); end
julia> @allocated foo()
16000208
The problem is this @noinline
which forces a wrapping Tuple{Vector{Int}}
to be allocated as a temporary, even though it is immediately unwrapped at every call-site:
https://github.com/JuliaLang/julia/blob/58291db09d18f59223edbdc15592ffcf0eb3dcfa/base/dict.jl#L1004
So is the problem there that the API wrongly returns an object of type (leaf.val,)
instead of Some{V}(leaf.val)
?
Would Some{V}(leaf.val)
bypass the need to allocate the temporary here?
I guess not. Apparently we do not have calling convention support for Union{Struct, Ghost}
, even though we very easily could (we have many variations on it already) and probably should (it is the iteration protocol)
I have looked into using scoped values for some temporary arrays to avoid allocations in parallel tasks. However, it seems scoped values are allocating when accessed, whereas with tls it can be avoided. This is unfortunate, since gc in parallel tasks can be a performance problem.
output: