bgrimstad / splinter

Library for multivariate function approximation with splines (B-spline, P-spline, and more) with interfaces to C++, C, Python and MATLAB
Mozilla Public License 2.0
418 stars 115 forks source link

Potential memory leak #126

Open atsiflis opened 3 years ago

atsiflis commented 3 years ago

Hi,

Thank you for making this very useful library available.

I am using it in a Python script that builds a spline once and then evaluates it multiple times. I have noticed that the memory allocation of the Python process increases by a few MB after each evaluation and I am considering whether this is due to some memory leak in splinter. To investigate this, I run the following simplified example (adapted from the basic usage example)

import numpy as np
from guppy import hpy
import psutil, os
import splinter

def f1(x):
    return -1. + 2*x + 0.1*(x**2) + 10*np.random.rand(1)[0]

# Generate the sample data
x = np.arange(0, 11, 1)
y = np.zeros((len(x),))
for i in range(len(x)):
    y[i] = f1(x[i])

# Build cubic B-spline
bspline = splinter.BSplineBuilder(x, y, degree=3).build()

# Generate evaluation points
xd = np.arange(0, 10, .01)

# Evaluate B-spline, calculating memory allocation
hp = hpy()
process = psutil.Process()
hp_before = hp.heap()
psutil_before = process.memory_info().rss
bspline.eval(xd)
print(process.memory_info().rss - psutil_before)
print(hp.heap() - hp_before)

from which I get

73728
Partition of a set of 2 objects. Total size = 444 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      416  94       416  94 types.FrameType
     1      1  50       28   6       444 100 int

Given that the call to bspline.eval() increased the memory allocated to the process by 73728 bytes but the size of the reachable objects increased by only 444 bytes, is it right to conclude that there is a memory leak in splinter?

gablank commented 3 years ago

Hello @atsiflis!

We're happy the library has been useful to you :-)

I agree that it may seem like a memory leak. As far as I know (and remember), the evaluation of a spline should not result in any allocation of memory (that is not also freed before returning), except for the data structure to hold the result. However, the Python interface makes reasoning about this a bit harder, as it uses a garbage collector which may run at (seemingly) random times.

Ideally, the memory usage of this should converge to some constant:

while True:
    y = bspline.eval(x)

I would have to have a more thorough look to determine if this is actually a memory leak, or if this is just Python playing us a trick.

Thank you for the report and the example, it will be very useful for debugging purposes!

yj-Roy commented 3 years ago

I also meet the same problem. My dataset is large,it will occupy all the memory of my computer, and report an error. I still have a problem and do not know how to solve it. I use python, if I use for loop + splinter, its time cost is mainly in a large number of for loops, it will take more time than scipy( because scipy uses matrix processing), how to solve this problem, thanks

yj-Roy commented 3 years ago

I can't solve this problem, but I think I can alleviate it. The cinterface.h file contains a function SPLINTER_API void splinter_datatable_delete(splinter_obj_ptr datatable_ptr); But this function is not used in codes, So in the datatable.py I add this function:

def __del__(self):
    if self.__handle is not None:
        splinter._call(splinter._get_handle().splinter_datatable_delete, self.__handle)
    self.__handle = None

After that, still have a memory leak problem, but the leak speed is very slow. Thanks

bgrimstad commented 3 years ago

Hi, @yj-Roy. Good catch! I have not investigated this problem, but it seems reasonable to implement the del method for all the Python classes (not only DataTable) to prevent memory leakage. It would be great if you could prepare and submit a pull request so that we can fix the problem for all users.

Bjarne