Closed jdoiro3 closed 1 year ago
What you are experiencing is not a problem of PID reuse, is that in macOS the default multiprocessing method is not fork
but spawn
. This means:
1) memray cannot track across forks because there are no proper forks.
2) the process doesn't get a reused pid because the OS is reusing the pids, but because it starts many copies with file_name=foo
.
If you switch to fork it works correctly:
from concurrent import futures
import os
import memray
import multiprocessing
multiprocessing.set_start_method("fork")
def foo(i):
return(i * i)
def bar():
with memray.Tracker(file_name="foo", follow_fork=True):
with futures.ProcessPoolExecutor(max_workers=os.cpu_count() + 1) as p:
return sum(p.map(foo, list(range(os.cpu_count() + 1))))
bar()
Check also: https://bloomberg.github.io/memray/supported_environments.html#known-issues-and-limitations
Closing as not a bug for now, feel free to reopen if you see something missing
My example wasn't complete. In this example the memray bin file with PID 26774
gets overwritten. If I pass the max_workers to be the total number of tasks I need this won't occur. Not necessarily a Memray bug but it would be nice to be able to somehow define part of the forked process's bin file name. Not sure how that would work though.
bin file with PID
26774
gets overwritten.
Nothing gets overwritten. What happens is that the pool spawns a bunch of workers and every worker is given different jobs during its lifetime. You are seeing PID 26774
twice because that worker gets two work items without creating new processes:
@pablogsal - thanks for taking the time to explain this.
Is there an existing issue for this?
Current Behavior
When using Memray's
Tracker
withfollow_fork = True
and a forked subprocess has the same PID and a previous one, the file will be overwritten. This means if I have 3 cores and 4 forked processes, and the last process to run reuses a previous PID, you'll end up with only 4 bin files, when there should be 5 (1 for the parent process and 4 for the forked processes).Expected Behavior
I expect the number of forked processes plus the parent process to equal the number of generated bin files.
Steps To Reproduce
Memray Version
1.3.0
Python Version
3.8
Operating System
macOS
Anything else?
No response