JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.73k stars 5.49k forks source link

Allocations of Julia Tasks' stacks are missed by `@time` and Allocation Profiler, and at least 9 other allocations missed by Allocation Profiler #48216

Open NHDaly opened 1 year ago

NHDaly commented 1 year ago

There are 9 objects being allocated when allocating a Julia Task that are missed by the Allocation Profiler (https://github.com/JuliaLang/julia/pull/42768).

Additionally, both the Allocation Profiler and @time are missing the allocation of the stack for a Julia Task.

You can see in the following example, that we are actually missing around 9 allocations, ~including the allocation of the Task stack,~ but they are tracked by @time:

julia> using Profile

julia> @time Threads.@spawn sleep(10)  # warmup
  0.025083 seconds (23.00 k allocations: 1.436 MiB, 67.28% compilation time)
Task (runnable) @0x000000013056ab60

julia> @time Threads.@spawn sleep(10)  # warmup (again?)
  0.000061 seconds (30 allocations: 2.102 KiB)
Task (runnable) @0x0000000110b73340

julia> @time Threads.@spawn sleep(10)
  0.000064 seconds (29 allocations: 2.070 KiB)
Task (runnable) @0x0000000130568fd0

julia> @time Threads.@spawn sleep(10)
  0.000072 seconds (29 allocations: 2.070 KiB)
Task (runnable) @0x0000000130569270

julia> @time Threads.@spawn sleep(10)
  0.000066 seconds (29 allocations: 2.070 KiB)
Task (runnable) @0x0000000130569510

julia> Profile.Allocs.clear(); begin
           Profile.Allocs.@profile sample_rate=1 Threads.@spawn sleep(10)
       end
Task (runnable) @0x0000000130569660

julia> [a.size for a in Profile.Allocs.fetch().allocs]
20-element Vector{Int64}:
  48
  56
 104
  80
  16
  32
  56
 200
  32
  56
 128
   7
  40
  64
  89
  64
  32
  16
  32
 328

I think the stack allocation happens here, but i'm not 100% sure: https://github.com/JuliaLang/julia/blob/b7201d6b460df74f024a6f19f837437a2c0613d1/src/gc-stacks.c#L150-L182

or maybe this is a better place to instrument - this is the actual site of the mmap: https://github.com/JuliaLang/julia/blob/b7201d6b460df74f024a6f19f837437a2c0613d1/src/gc-stacks.c#L53-L67

~Where does this get tracked and added to the allocs reported in @time?~

EDIT: I believe that the stack allocation is also not being captured by @time, since I think they're more like 4MiB, and I don't see any accounting for the allocation in the above code.

NHDaly commented 1 year ago

I'm not sure what the other 8 missing allocations are yet. ~This was the most obvious one, since there's around 2.00 KiB unaccounted for.~

EDIT: Oh wait, duh, nvm, the existing allocations already sum to 1.480 KiB, so the stack allocation isn't currently being reported in @time, either! 💡

NHDaly commented 1 year ago

CC: @gbaraldi, @IanButterworth

NHDaly commented 1 year ago

EDIT: Oh wait, duh, nvm, the existing allocations already sum to 1.480 KiB, so the stack allocation isn't currently being reported in @time, either! 💡

NHDaly commented 1 year ago

K, i updated the description per my new understanding.

gbaraldi commented 1 year ago

We might not want to report the stack allocation anyway, it's kind of a deep internals thing that might make a user go crazy. I do wonder where the other allocations are coming from. I guess an rr trace to see when the allocation counter goes up but we don't record the allocation.

NHDaly commented 1 year ago

Yeah, i'm not sure either.. I think that it might be useful though, since it is ultimately part of the user's program, and it's something that they have control over. E.g. if you find you're unexpectedly spawning like 10,000,000 tasks or something, and it's your main contributor to allocations?

But i can see the argument not to report them as well...

I guess an rr trace to see when the allocation counter goes up but we don't record the allocation.

Great idea, yeah! :) I don't think we have the cycles to push on this now, but i'll leave this idea here for the future, or for if you or anyone there has the cycles+interest to hunt these down. Thanks! 😊

NHDaly commented 1 year ago

If it was solely up to me, i think i'd lean towards reporting the allocations of the Task stacks as well.

I'd also be interested to check to see what Golang's allocations profiler does, and whether or not they report them.

NHDaly commented 1 year ago

CC: @vilterp as well

vchuravy commented 1 year ago

The task stack is anonymous memory, we ask the OS for 4MiB but the actually allocation does not happen. So this is potential memory use not actual memory use. The cost of memory will be incurred when the stack actually grows to that size.

So I don't think we should report it as part of @time

NHDaly commented 1 year ago

💡 oh cool, thanks! I didn't know that. That's nifty. Okay yeah i think that sounds reasonable then. 👍

thanks! Should I repurpose this issue for the remaining 9 other missing allocs, or close it and open a new one?

charleskawczynski commented 1 year ago

I'm curious if this is related to #50187

NHDaly commented 1 year ago

🤔 it doesn't look like it's exactly the same, since no tasks involved, but agreed that it's the same symptom... Yeah, maybe they're related! 💡 Thanks @charleskawczynski. I commented on the other thread too 👍