Closed mharradon closed 7 years ago
I'm currently rerunning with 'del inputs' after I call the function to rule out GC missing it (or some other reference being retained by me). Will report back with results.
So such luck, still leaking :(
Will try manually closing after calling function.
Interesting! I've never seen this before....did not realize you could hit a limit on this.
Shared memory is created through memmap, which links to a filename. To parse the filenames listed there: "synk", followed by process ID (91173), followed by usages within synk (data), followed by unique data object ID number, followed by another tag which is unique to that data object.
Rather than call build_inputs
every time in a loop, build inputs once before the loop. Then inside the loop, use the same synk data objects but set their value to the newly desired ones:
x = synk.data(var=x_var)
while True:
new_dat = get_new_data()
x.set_value(new_dat)
Alternatively, if you'd like to write certain entries and the array won't change shape, you can write to it like a numpy array. This might save you a memory copy.
x = synk.data(value=first_data_array)
while True:
x[:] = get_new_data()
If the new array will not be the same shape as the old one, use the set_value
method, and it will take care of it. (If it needs to allocate a bigger array, it will get rid of the old allocation memmap and make a new one, with the same name but with that final tag incremented.)
Ah, I like your way better!
Managed to fix it by calling free_memory() after calling the function:
inputs = train_func.build_inputs(*inputs)
train_func(*inputs)
for inp in inputs:
inp.free_memory()
I fixed a few typos in data_module.py to get free_memory() to work - I'll submit a pull request, assuming those are correct.
Were you keeping references around to all the separate synk data objects?
I may have left a leak open when the master process dereferences a synk data object, but the workers are still holding onto it...hmm..
edit: or even in the master, it is held onto even if you drop it..hmm..
I was not leaving references around that I could find - I was also calling del inputs immediately after the function was called. So I think you may be right.
After running for a while:
There's a ton of files in /dev/shm/ - are we leaking file handles? Some reference is being retained?
There's approximately twice as many entries as iterations of the function (called build_inputs() about 400 times, each with 2 arguments - now there's a little more than 800 files in there).
It could be that my code is retaining references that need to be killed, not sure.
Thanks!