Capture memory usage - Githubissues

jkyang92 commented 2 years ago

capture seems to have a slow memory leak. Running the following loop on my system increases M2's memory usage by ~0.5MB/s seemingly indefinitely. (I ran it till M2 was using 300MB).

while true do (scan(apply(100,i-> "x = 1;y = 2;"),capture);collectGarbage())

The following loop seems to leak much faster (1GB within 15min), but it's harder to figure out what's leaking:

needsPackage "SpecialFanoFourfolds"
while true do (scan(values tests SpecialFanoFourfolds,capture @@ code);collectGarbage())

In contrast the following loop does not cause an increase in memory usage:

while true do (scan(apply(100,i-> "x = 1;y = 2;"),v -> (x = 1;y = 2;));collectGarbage())

Valgrind does not seem to detect a leak.

mahrud commented 2 years ago

Not sure if you saw #1689, but I think this is a great feature of capture:

use capture for memory leak detection and memory benchmarking!

In other words, I think the first example you have is evidence that there's a leak, but in M2's interpreter.

Also see https://github.com/Macaulay2/M2/issues/1728#issuecomment-750425861, where this was first noticed.

jkyang92 commented 2 years ago

Okay, after messing the code, it seems we are leaking DictionaryList objects (and consequently dictionaries and all the data they contain). It seems that we keep a list of all dictionaries in expr.d? (allDictionaries variable and the record function) Can a dictionary ever get collected?

DanGrayson commented 2 years ago

Yes, dictionaries don't go away -- that's so the dictionary can be recovered from the frame, which contains an integer called the frame ID. An alternative (and perhaps better) way to do it would be for each frame to contain a pointer to its dictionary.

mahrud commented 2 years ago

Oh, I had assumed that when we call collegeGarbage in capture, discarded dictionaries get collected! But in reality nearly nothing that is assigned to a symbol is ever collected?! That seems like an oversight that should be fixed.

It would be great to also document how frames work along the way.

DanGrayson commented 2 years ago

The dictionary caching has nothing to do with being assigned to a symbol -- they are cached so the dictionary can be recovered from the frame id, and that could be done away with as I sketched above. (The dictionaries cached contain just symbols, not "symbol closures". It's the symbol closures that point to frames containing symbol values.)

The number of dictionaries is limited -- once you have loaded all the code, no more dictionaries are made. Thus the non-collection of dictionaries is not a memory leak problem.

mahrud commented 2 years ago

The dictionary caching has nothing to do with being assigned to a symbol -- they are cached so the dictionary can be recovered from the frame id, and that could be done away with as I sketched above. (The dictionaries cached contain just symbols, not "symbol closures". It's the symbol closures that point to frames containing symbol values.)

Could you please explain or better yet the document this system somewhere on the wiki? Even starting with some common terminology would be good.

The number of dictionaries is limited -- once you have loaded all the code, no more dictionaries are made. Thus the non-collection of dictionaries is not a memory leak problem.

capture makes new dictionaries every time. If you run hundreds of tests, that's the equivalent of hundreds of new dictionaries.

jkyang92 commented 2 years ago

Incidentally value(String) also creates new dictionaries.

DanGrayson commented 2 years ago

Could you please explain or better yet the document this system somewhere on the wiki? Even starting with some common terminology would be good.

Recall from LISP the notion of function closure, or simply closure. When one returns a function as a value, that function closure remembers the values of the variables at the time it was created. Here is an example in M2:


i1 : f = x -> () -> x

o1 = f

o1 : FunctionClosure

i2 : g = f 33

o2 = g

o2 : FunctionClosure

i3 : g()

o3 = 33

i4 : h = f 44

o4 = h

o4 : FunctionClosure

i5 : h()

o5 = 44

The code () ->x is referred to as a function body. The lexical scope of the function body has associated with it a dictionary, whose single symbol is x, but in general it will contain all the new variables appearing in the body of the function. When the function closures g and h are created by calling f, the function body is paired with a "frame", which is an array of expressions providing the values of the variables in the lexical scope of the body of the function. So g has a frame containing 33 and h has a frame containing 44. The symbol x contains information that specifies where in the frame its value is stored, and when g and h are executed, they look into the frame at that spot to retrieve the value.

Similarly, M2 has "symbol closures". Here is an example:


i1 : f = x -> () -> symbol x

o1 = f

o1 : FunctionClosure

i2 : g = f 33

o2 = g

o2 : FunctionClosure

i3 : g()

o3 = x

o3 : Symbol

i4 : value oo

o4 = 33

i5 : h = f 44

o5 = h

o5 : FunctionClosure

i6 : h()

o6 = x

o6 : Symbol

i7 : value oo

o7 = 44

All the symbols appearing at top level are actually symbol closures in this sense. They contain a symbol and a pointer to the appropriate frame where the value is stored.

The number of dictionaries is limited -- once you have loaded all the code, no more dictionaries are made. Thus the non-collection of dictionaries is not a memory leak problem.

capture makes new dictionaries every time. If you run hundreds of tests, that's the equivalent of hundreds of new dictionaries.

Ah, thanks for the reminder. So maybe it is time to implement the fix I proposed above. (On the other hand, dictionaries aren't very large, so it may not be urgent.)

jkyang92 commented 2 years ago

Not related to the Dictionary issue, but the following loop also leaks, independent of the previous issue. (I made record in expr.d a no-op, most code doesn't need it to actually work).

testCode = ///
    R = QQ[x,y,z];
    x+1;
    assert ( 1 == 1 )
///
scan(1000, i-> (capture(testCode,UserMode=>false);collectGarbage()));

I can test it later but I'm fairly certain it's leaking the polynomial ring on every cycle, because if I replace the first line of the test code with nearly anything else, we don't leak anymore.

I think this causes most tests to leak, since most tests create polynomial rings.

mahrud commented 2 years ago

Just to make sure, you're disabling Usermode, right?

jkyang92 commented 2 years ago

Yep, it's right there in the UserMode=>false. Also, it actually doesn't seem to leak with UserMode=>true.

jkyang92 commented 2 years ago

I think I figured this out. The problem is how globalFrame in expr.d interacts with calls to new Dictionary. When you call new Dictionary in M2 you are actually creating a new DictionaryClosure in the D code. This closure uses globalFrame, and all values get stored there. The problem is after the dictionary for them goes out of scope, these values can no longer be removed and get collected. And of course the globalFrame itself will never be collected.

mahrud commented 1 year ago

Referencing a comment here:

I've ran some checks, including https://github.com/Macaulay2/M2/blob/048c777dc703365e44c3865b2c1faed698b82b3a/bugs/anton/MEMORY-LEAKS/NAG-leaks.m2 Everything seems to be fine.

BTW, if anyone has ideas on how to improve the memory leak testing (from the frontend) above, please let me know.

_Originally posted by @antonleykin in https://github.com/Macaulay2/M2/pull/2770#discussion_r1133789097_

I wonder if after fixing the problem above, we could use capture to detect memory leaks from the frontend.

Macaulay2 / M2

Capture memory usage #2330