Memory leak with either user_data and/or Cython extensions

bmcage / odes

The ODES scikit for ordinary differential and algebraic equations, an extension to scipy

https://scikits-odes.readthedocs.io/

Other

122 stars 44 forks source link

Memory leak with either user_data and/or Cython extensions #149

Closed Eheran1 closed 1 year ago

Eheran1 commented 1 year ago

I am using the IDA solver like this: solver = dae('ida', residuals_func, **extra_options, user_data = kwargs)

For faster computation, I moved a bunch of functions from the residuals_func to Cython extensions, I also used user_data = kwargs to move away from global variables at the same time. Ever since I did the 2 changes, there is a massive memory leak (~40 MB/s, at ~20'000 function calls [which the solver does] per s). But the thing is: If I just run the extensions in a loop, they dont leak memory at all. The whole residual function does not leak any memory when tested in a loop. It only happens when I use the solver. Using tracemalloc to try and locate the issue does not show anything other than a few dozen MB (static), while there are several GB according to the windows 10 task-manager.

Could the use of user_data be the cause of the issue? That kind of matches what I am seeing. Its a dict with ~60 entries, sys.getsizeof() returns 2200 (Bytes?), that would match the ~40 MB/s, at ~20'000 function calls per s.

bmcage commented 1 year ago

Never used user_data myself. Userdata is always just passed in the code, and arguments are by reference normally, so should not be copied over, for example.

object userdata = None) except? -1:
if self.with_userdata == 1:
  user_flag = self._jac_times_vecfn(rr, yy, yp, rr, v, Jv, cj, userdata)

From https://stackoverflow.com/questions/21720982/cython-c-passing-by-reference, I see that in cython though you want & operator to pass by reference, which is not done. So in code then it might be needed to have

object &userdata = None)

though not sure this works with the None. To try in ida.pyx

If you don't want global variables, what I do is residuals_func being an object member, then instead of accessing user_data, you can access self.user_data as you want. That does not leak I know. Another way to see how classes can simplify life, see following where a global vector is moved to a fixed class vector https://stackoverflow.com/questions/29356403/cython-pass-by-reference

Eheran1 commented 1 year ago

Userdata is always just passed in the code, and arguments are by reference normally

I dont understand this. Passed in code or by reference how? What if these values change during the integration?

object &userdata = None)

You are saying I should try to compile the ida.pyx with that &? Do I need any special flags or I-dont-know-what to compile that? Or is the usual setup.py enough? Will try it then.

what I do is residuals_func being an object member, then instead of accessing user_data, you can access self.user_data as you want. That does not leak I know.

Could you give me a simple example with how that works? I need to be able to manipulate some parameters that change during the integration of IDA. Currently, I do that by solving with this option 'one_step_compute': True, so one step at a time, then computing new values for the parameters, which then get passed in the next step with user_data.

bmcage commented 1 year ago

Yes, I meant to compile it with &, as your problem seems to indicate some copy is made somewhere.

About the example of the rhs as a class, see the example https://github.com/bmcage/odes/blob/master/ipython_examples/Double%20Pendulum%20as%20DAE%20with%20roots.ipynb

There the rhs is a class itself, with an evaluate method. The user data can be stored in the class, and be changed in other parts of your program.

Alternatively, but I have no open example of that, you could have a member function in class Doublependulum, which you pass as the rhs function. That function would also have access to the class data, which can be changed outside by your program. Best go with the example above however, as you can adapt the example code.

Eheran1 commented 1 year ago

~~I only added the & to line 720 and 740, which are in cdef class IDA_ErrHandler: and cdef class IDA_WrapErrHandler(IDA_ErrHandler):~~ I am unable to compile it.

In the example you linked, all of the constants are really constant, nothing changes other than time, y and ydot, all supplied from IDA. I need those constants to change depending on where the integration is as well as other factors. Other than that, it would just look like this?

class my_problem():
    # Get my parameters in here somehow
    #...parameters...
    # the parameters need to change over time, supplied from an outside function

    # figure out how to so this __init__
    def __init__(self, data=None):

    def set_res(self, resfunction):
        self.res = my_current_resfunction_here # not sure how my parameters get there?

bmcage commented 1 year ago

The my_problem class needs to inherit from class ida.IDA_RhsFunction

class my_problem(ida.IDA_RhsFunction):

So, no init, or if you do it, you need to call the parent init also. You should call set_res yourself, with parameter resfunction your current resfunction.

Another code can then change parameters, the example is indeed simple, it does not do that. Consider in the example

def evaluate(self, tres, yy, yp, result, userdata):
    m1 = self.dblpend.m1
    m2 = self.dblpend.m2
    g = self.dblpend.g

So, in your case, you evaluate tres, and you can call something like

def evaluate(self, tres, yy, yp, result, userdata):
    self.dblpend.set_parameters_at_time(tres)
    m1 = self.dblpend.m1

This set_parameter_at_time can set correct values of m1, m2, .... HOWEVER, keep in mind IDA can probabaly call evaluate at times BEFORE the previous call to it, this happens if refinements are happening with changing timesteps

Be very careful changing parameters of a problem, so as to have the IDA solver NOT failing, ALL changes should happen continuous. So you need to smooth all discontinuities normally.

Eheran1 commented 1 year ago

This means that the set_parameter_func happens in every iteration? I only want it to happen at certain steps, usually after every successful solution-step, to reduce the computational load since these parameters change relatively slow. Some parameters also can only go one-way (for example a reaction), so needs to happen only after a new solution is found.

And yes, the solver can not just go forward but also needs to go backwards from an unsuccessful solution-try, which is still forward when seen from the previous solution.

And yes, changing parameters non-continuous breaks the solver. Luckily they can change fairly fast without breaking it otherwise, it just needs to take smaller steps then.

bmcage commented 1 year ago

I believe you said you do a one step solver, so in the solution loop you can change the parameters, and otherwise not. But that gives the danger the solver takes too long steps, and jumps over the reaction you need, so I don't think for correct solutions you can avoid changing values in the rhs function itself. It allows to smooth changes for the solver.. I've used time based changes, and then the rhs changes, but you know beforehand, so no need to change parameters on the fly, though the rhs does change and the solver adapts the stepsize to overcome the change. Anyway, you can work with flags to determine if the set_parameter_func needs to be called (eg, time of call has to be some Delta_t bigger than last good solution, ...), and in the solution loop after a successful step change parameters values as needed. Once you use classes, many different ways are possible, it's a matter of coding.

Eheran1 commented 1 year ago

After trying some things, everything failing, I want to go back to potentially fixing the issue in ida.pyx directly. However, to compile it I need the file sundials_config.pxi, since that is included in it (line 11). Seems like this gets created in setup_build.py.

Note that I have not tried using the class method you suggested, I just dont understand how I could do that. The double pendulum example has 3 separate classes that all somehow interact. Here is another try, maybe you can tell how and where I fail:

# this class above the function where my IDA-solver runs
class my_problem(ida.IDA_RhsFunction):
    def set_problem(self, problem):
        self.problem = problem 

    def evaluate(self, tres, yy, yp, residuals, userdata):
        # dont know what self contains
        # dont know what tres is or where it comes from
        # now change some parameters based on the current userdata
        some_parameter = some_function_elsewhere(userdata)
        # calculate residuals
        residuals[0]= some_parameter # simple example
        return 0 # is there a reason to return 0 instead of nothing as I always do?

solver_func():
    residuals.set_my_problem(problem)
    solver = dae('ida', residuals, **extra_options)
    try:
        while t < t_max and solution_flag == 0:
        solution = solver.step(t)
        # now change the userdata (a dict since its a lot of values)
        problem.userdata["temperature"] = 200

        # log the data for plotting and calculations after the integration
        parameter_log.append(problem.userdata["temperature"]) 
        # also log t / y / y_dot values

    return t_log, y_log, y_dot_log, parameter_log

Eheran1 commented 1 year ago

I found the issue after another round of investigation, looking through the code, running it line by line with memory reporting via memory_profiler etc. What actually did the trick was running test-sections 10 or 100 times instead of just once and looking at the memory consumption after every 10'000 iterations (=100'000 or 1E6 loops through the test-section). If the extra-execution of the test-section then had a meaningful effect, it was visible pretty much instantly. Very unlike looking at the memory consumption line-by-line (or over some test-section), since the memory_profiler slows the program down by a factor of 1'000x, so it needs to run hours for a few MB of change, instead of seconds.

It was a .tolist() call, called once per successful iteration (so a few 1'000 times per second), that caused the massive memory consumption. It has nothing to do with IDA, user_data or Cython.

Edit: Note that due to the very slow run time when checking memory consumption of individual sections, It did appear as if IDA is really the main memory user (factor ~10 compared to the whole rest). Only after letting it run for many hours did it get clear that the number only increases a few times by a few MB initially and then drop back down later on.