SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.01k stars 557 forks source link

Hierarchical kernel memory leak #341

Closed ptonner closed 6 years ago

ptonner commented 8 years ago

Related to issue #304, creating multiple GPs using the hierarchical kernel leads to memory leakage. Using modified script from previous issue here, I tested garbage collecting for RBF kernels and hierarchical kernels. GC works as expected for RBF but memory usage for hierarchical increases after each iteraction. Any suggestions on how to fix this?

output:

rbf: [75.85546875, 75.8671875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875, 76.48046875]

hierarchical: [75.875, 75.88671875, 77.0703125, 77.96484375, 78.67578125, 79.34375, 80.28125, 80.9921875, 81.67578125, 82.3828125, 83.0703125, 83.78125, 84.2421875, 85.01171875, 85.625, 86.140625, 86.65625, 87.21875, 87.73046875, 88.39453125, 89.0625, 89.51953125, 90.03125, 90.546875, 91.0625, 91.8359375, 92.08984375, 92.86328125, 93.12109375, 93.63671875, 94.13671875, 94.6484375, 95.3046875, 95.5625, 96.078125, 96.59375, 97.109375, 97.6171875, 97.82421875, 98.3359375, 98.8515625, 99.3671875, 99.66796875, 100.12109375, 100.6328125, 101.03515625, 101.5078125, 101.98046875, 102.234375, 102.75, 103.265625, 103.5234375, 104.07421875, 104.33203125, 104.734375, 105.1953125, 105.44921875, 105.96484375, 106.22265625, 106.73828125, 107.25390625, 107.5, 107.96484375, 108.21875, 108.88671875, 109.140625, 109.3984375, 109.65625, 110.171875, 110.6875, 110.98046875, 111.1875, 111.69921875, 111.95703125, 112.47265625, 112.73046875, 113.24609375, 113.484375, 113.73828125, 114.25390625, 114.51171875, 114.76953125, 115.28515625, 115.54296875, 115.8046875, 116.26171875, 116.515625, 117.03125, 117.19140625, 117.63671875, 117.8984375, 118.15625, 118.4140625, 118.9296875, 119.1875, 119.48828125, 119.7421875, 120.0, 120.2578125, 120.515625, 120.8671875, 121.12109375, 121.63671875, 121.89453125, 122.15234375, 122.50390625, 122.7109375, 122.96484375, 123.42578125, 123.64453125, 123.8984375, 124.3515625, 124.609375, 124.8671875, 125.125, 125.3828125, 125.640625, 125.8984375, 126.15625, 126.40234375, 126.9140625, 127.171875, 127.32421875, 127.578125, 128.03125, 128.2890625, 128.546875, 128.8046875, 129.171875, 129.42578125, 129.62890625, 129.8828125, 130.08984375, 130.546875, 130.80078125, 131.05859375, 131.31640625, 131.57421875, 131.83203125, 132.08984375, 132.34765625, 132.60546875, 132.859375, 133.11328125, 133.37109375, 133.62890625, 133.62890625]

Dapid commented 8 years ago

Can you profile the memory usage? For example: https://pypi.python.org/pypi/memory_profiler

Python should not have memory leaks, as long as you run the garbage collector, but some of the Cython code may have left some allocation dangling.

ptonner commented 8 years ago

The overall memory footprint for the hierarchical kernel is much larger than RBF (23.3 MiB vs 0.6 MiB) according to memory_profiler. Also, the total memory used for the hierarchical kernel appears to scale with the number of iterations in the loop, but for the RBF it is fixed at 0.6 MiB.

Here's the output for the hierarchical kernel:

Line # Mem usage Increment Line Contents

24     76.5 MiB      0.0 MiB   @profile
25                             def f():
26     99.8 MiB     23.3 MiB       for i in xrange(its):
27
28     99.7 MiB     -0.1 MiB           if hier:
29     99.8 MiB      0.1 MiB               m =

GPy.models.GPRegression(feats, labels,GPy.kern.Hierarchical(kernels=[GPy.kern.RBF(2), GPy.kern.RBF(2)])) 30 else: 31 m = GPy.models.GPRegression(feats, labels) 32 33 99.8 MiB 0.0 MiB if collect: 34 99.8 MiB 0.0 MiB gc.collect(2)

And for the RBF kernel:

Line # Mem usage Increment Line Contents

24     76.5 MiB      0.0 MiB   @profile
25                             def f():
26     77.1 MiB      0.6 MiB       for i in xrange(its):
27
28     77.1 MiB      0.0 MiB           if hier:
29                                         m =

GPy.models.GPRegression(feats, labels,GPy.kern.Hierarchical(kernels=[GPy.kern.RBF(2), GPy.kern.RBF(2)])) 30 else: 31 77.1 MiB 0.0 MiB m = GPy.models.GPRegression(feats, labels) 32 33 77.1 MiB 0.0 MiB if collect: 34 77.1 MiB 0.0 MiB gc.collect(2)

On Mon, Mar 21, 2016 at 8:10 AM, Dapid notifications@github.com wrote:

Can you profile the memory usage? For example: https://pypi.python.org/pypi/memory_profiler

Python should not have memory leaks, as long as you run the garbage collector, but some of the Cython code may have left some allocation dangling.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/341#issuecomment-199244899

lawrennd commented 8 years ago

@mzwiessele did you get a chance to look at this one (sorry I know you've been busy with lots of them!)

On Mon, Mar 21, 2016 at 5:43 PM, Peter Tonner notifications@github.com wrote:

The overall memory footprint for the hierarchical kernel is much larger than RBF (23.3 MiB vs 0.6 MiB) according to memory_profiler. Also, the total memory used for the hierarchical kernel appears to scale with the number of iterations in the loop, but for the RBF it is fixed at 0.6 MiB.

Here's the output for the hierarchical kernel:

Line # Mem usage Increment Line Contents

24 76.5 MiB 0.0 MiB @profile 25 def f(): 26 99.8 MiB 23.3 MiB for i in xrange(its): 27 28 99.7 MiB -0.1 MiB if hier: 29 99.8 MiB 0.1 MiB m = GPy.models.GPRegression(feats, labels,GPy.kern.Hierarchical(kernels=[GPy.kern.RBF(2), GPy.kern.RBF(2)])) 30 else: 31 m = GPy.models.GPRegression(feats, labels) 32 33 99.8 MiB 0.0 MiB if collect: 34 99.8 MiB 0.0 MiB gc.collect(2)

And for the RBF kernel:

Line # Mem usage Increment Line Contents

24 76.5 MiB 0.0 MiB @profile 25 def f(): 26 77.1 MiB 0.6 MiB for i in xrange(its): 27 28 77.1 MiB 0.0 MiB if hier: 29 m = GPy.models.GPRegression(feats, labels,GPy.kern.Hierarchical(kernels=[GPy.kern.RBF(2), GPy.kern.RBF(2)])) 30 else: 31 77.1 MiB 0.0 MiB m = GPy.models.GPRegression(feats, labels) 32 33 77.1 MiB 0.0 MiB if collect: 34 77.1 MiB 0.0 MiB gc.collect(2)

On Mon, Mar 21, 2016 at 8:10 AM, Dapid notifications@github.com wrote:

Can you profile the memory usage? For example: https://pypi.python.org/pypi/memory_profiler

Python should not have memory leaks, as long as you run the garbage collector, but some of the Cython code may have left some allocation dangling.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/341#issuecomment-199244899

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/341#issuecomment-199397124

mzwiessele commented 8 years ago

We did have a look, but there has not been any solutions yet. It is being looked at, though. It will probably take a while to be eradicated, but we will get there : )