SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.01k stars 557 forks source link

Memory usage increasing when creating models inside a loop #304

Closed beckdaniel closed 6 years ago

beckdaniel commented 8 years ago

Greetings,

I've running a few experiments with GPy involving cross-validation. This means I have an outer loop that at each iteration creates a new train/test split and builds/optimize a GPy model. Running my experiments I noticed that the memory usage was increasing with time, which is something I was not expecting. I don't know the details about Python's garbage collector but it seems to me that model instances in previous iterations are not properly "deleted".

I've put a gist that replicates this behaviour which uses the memory_profiler package. It has a function that creates a model using the same data a number of times inside a loop. The code calls the "memory_usage" function, which returns an array containing memory usage over time:

https://gist.github.com/beckdaniel/693b851fdb48d3c28911

If I run it for scikit-learn's SVR (using "python mem_test.py svm"), I get the following printed result:

[86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875, 86.9921875]

This is what I would expect: constant memory usage over time since I'm just recreating the same models using the same data. Compare this with GPy (using "python mem_test.py gpy"):

[87.4921875, 89.30859375, 97.55078125, 100.28125, 101.28125, 101.28125, 98.67578125, 101.1796875, 101.1796875, 102.19140625, 102.8828125, 102.8828125, 103.9140625, 104.67578125, 104.67578125, 105.6640625, 106.43359375, 106.43359375, 107.44921875, 108.22265625, 108.48046875, 109.25390625, 110.265625, 110.265625, 111.0390625, 111.91796875, 111.91796875, 112.94921875, 113.72265625, 113.72265625]

As you see, the memory usage increases over time. Not sure if this is expected behaviour, is it?

Dapid commented 8 years ago

The garbage collector should definitely be removing them. There are a couple of things to try:

* Run GC explicitly after each iteration: `gc.collect(2)`
* Make sure we don't have cycles with `__del__`: print gc.garbage

If this doesn't yield results: can you profile the memory usage? For example, with meliae and runsnake:

http://www.vrplumber.com/programming/runsnakerun/

And let's see if that shows where the problem lies.

beckdaniel commented 8 years ago

Printing "gc.garbage" does not output anything (which I suppose it is expected) but calling the collector explicitly gave me interesting results. The memory usage looks like this now:

[87.53515625, 89.4140625, 89.875, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375, 89.93359375]

Even if this solves the problem, still, we shouldn't need to call the collector explicitly, right? Or am I missing something?

Dapid commented 8 years ago

I confirm it. The garbage is clean, but forcing the collection seems to fix the problem. There are 191 objects living in the second generation of garbage, in some sort of cycle.

The GC runs at non deterministic times, so we can't trust it will sweep the floor on time. No one should be running gc explicitly, or at least that is what the devs say.

The solution is: find the cycle and either break it, replace some of the links by weakrefs, or forcing collection in the del method of GPRegression.

alansaul commented 8 years ago

Dapid how did you confirm the 191 objects? Is there any way of telling of what instance they are or where they are generated? Might provide some clues.

alansaul commented 8 years ago

Ah gc.collect(2), whoops!

Dapid commented 8 years ago

I found the way to figure out which objects are being collected. Setting gc.set_debug(gc.DEBUG_SAVEALL), the GC sends them to garbage instead of eliminating them

I have also modified the input data to make it more recognisable: https://gist.github.com/Dapid/1628aaa0a4abf7c001fb

So, probably the memory increase is due to copies of the input data that are trapped in some cycle.