Memory leak in Z3 Context using .NET API

GennadyGS commented 3 years ago

When instance of Context is created and disposed memory leak is observed.

Z3 Version: 4.8.10 (also reproducible with previous versions)

Steps to reproduce (C#):

// Check allocated memory before
Console.WriteLine(Process.GetCurrentProcess().WorkingSet64);

for (var i = 0; i < 100_000; i++)
{
    var context = new Context();
    context.Dispose();
}

GC.Collect(); // trigger garbage collection (just in case)

// Check allocated memory after
Console.WriteLine(Process.GetCurrentProcess().WorkingSet64);

Observed result

Memory allocated for Context instances is not disposed.

Console output from example above:

55795712 // ~55Mb 12888211456 // ~12Gb

Expected result

Allocated memory should be disposed.

Some background. I noticed this issue long ago and as workaround I reused single instance of Context across the application. However starting from version 4.8.10 solving some predicates has started leading to non-termination. These issues are not reproducible when fresh instance of Context is used. It means that there are side effects from solving previous predicates. Trying to solve this by creating fresh instance of Context for each quaey leads to memory leak. By the way, what is the recommended way of using Context and is it safe to reuse Context or in other words are observed side effects in 4.8.10 results of bugs rather than expected behavior? Thanks in advance.

NikolajBjorner commented 3 years ago

Memory allocated in contexts is freed by the finalizer. From what I understand only the runtime can invoke this, and probably it is invoked more deterministically when using the "using" scoped construct (sorry, I haven't tested this hypothesis as it takes some context switch to set up such tests). If you have suggestions on what else we can do for context objects, such as telling the .net runtime how much unit memory pressure there is on a context in some way so it can schedule the GC accordingly, be welcome to chime in.

GennadyGS commented 3 years ago

Hi Nikolay, Thank you for your answer.

In my opinion, it is problem, that memory is released by the finalizer, because there is really no way to call it explicitly. "Using" construct does not solve the problem either, because it is just syntactic sugar around calling Dispose method in finally section of try-finally block. I do not see any workarounds to avoid memory leaks in current design (in case Context instances are created dynamically).

Recommended pattern for .NET classes if to release all non-managed (that is non-dotNET) resources (including memory) in Dispose method.

From official .NET documentation:

Implementing the Dispose method is primarily for releasing unmanaged resources. ... There are additional reasons for implementing Dispose, for example, to free memory that was allocated, ...

Finalizer method is useful just to make sure that all resources are released automatically even when user did not call Dispose method. Usually Finalizer and Dispose method share the implementation, e.g. by extracting the additional method.

GennadyGS commented 3 years ago

Hi Nikolay, Thanks for your fix.

I checked the latest nightly release. As for simple example above, the memory leak is not reproducible any more. The only thing I observed, that this example started working much longer then before, which might be expected. However when I came to real-world scenario and tried to migrate my application to creating multiple contexts, I started observing obvious and stable memory leak (just in case I double checked that all created contexts were disposed). Not sure yet how to reproduce this issue with simple example, just wanted to inform you regarding my observations.

wintersteiger commented 3 years ago

I haven't looked at this in far too long, so this may have changed. I think the reason for using finalizers was that they can be re-scheduled if the underlying object's refcount isn't at 0 yet, and that non-finalize cleanup is not guaranteed to be executed when the system is busy. There's a whole bunch of infrastructure around this (dec-ref queues etc) which would need to be reviewed and adapted if we wanted to change this behaviour. Nikolaj's fix works for small tests, but I think it will start leaking memory once there are enough objects for the GC to make heuristic decisions.

Regarding the original report: GC.Collect() does not guarantee that all unmanaged memory will be freed. Z3 objects look like tiny objects to the CG, for which there is no need to release them if there's no memory pressure, so it will frequently decide not to release them immediately. Also, even if it does release the memory, the system may keep the underlying memory allocated for future use, so it may still show up in the working set. You'd need to be much more aggressive to make sure that the memory is indeed deallocated when you want it to.

ptr1120 commented 3 years ago

Hello,

I am also struggling with some kind of memory leak in using the z3 solver .Net Api. I am using z3 in a web application scenario, where the application may run for multiple months without a restart and multiple (parallel) users are using functionality powered by the z3 solver (each user having an individual Z3Context). In this scenario, users can configure devices (basically a CSP with i.e. 200 boolean variables and i.e. 200 constraints among the device properties and lots of solver.Check() calls during each configuration request of a user). I think using z3 in such a scenario is reasonable (i.e. Microsoft Dynamics 365 also does it in a similar way) and will become more and more important due to the excellent performance, interfaces, correctness, documentation,... of the Z3 solver.

I noticed that without using parallel contexts the memory loss is much smaller, but not sure whether it is a false conclusion. I also tried different versions of z3 (4.8.12, 4.8.10, 4.8.9) without remarkable differences. In order to reproduce/minimize the problem, I used the following code which simulates the situation:

public class Program
 {
     public static async Task Main()
     {
         var parallelSolverCount = 20;
         var cycle = 0;
         while (true)
         {

              Parallel.For(0, parallelSolverCount, i =>
              {
                  var solver = new SimpleSolver(maxSolutionCount: 500);
                  solver.SearchAllSolutions();
                  solver.Dispose();
              });

             ForceGC();
             Console.WriteLine(
                 $"Cycle: {cycle++}; PrivateBytes: {ToMegabyteString(Process.GetCurrentProcess().PrivateMemorySize64)}; HeapSize: {ToMegabyteString(GC.GetGCMemoryInfo().HeapSizeBytes)}");
         }
     }

     private static void ForceGC()
     {
         GC.Collect();
         GC.WaitForPendingFinalizers();
         GC.Collect();
     }

     private static string ToMegabyteString(double bytes) => $"{Math.Round(bytes * Math.Pow(10, -6), 1)} mb";
 }

 public class SimpleSolver : IDisposable
 {
     private readonly int _maxSolutionCount;
     private readonly Context _context = new Context();
     private readonly Solver _solver;
     private readonly IReadOnlyCollection<BoolExpr> _constraints;
     private readonly IReadOnlyCollection<BoolExpr> _variables;

     public SimpleSolver(int maxSolutionCount)
     {
         _maxSolutionCount = maxSolutionCount;
         _solver = _context.MkSolver();
         // Create Model
         _variables = Enumerable.Range(0, 100)
             .Select(i => _context.MkBoolConst(i.ToString()))
             .ToArray();
         _constraints = new[]
         {
             _context.MkOr(_variables.Take(10).ToArray()),
             _context.MkAnd(_variables.Skip(10).Take(10).ToArray()),
             _context.MkXor(_variables.Skip(20).Take(10).ToArray()),
             _context.MkOr(_variables.Skip(30).Take(10).ToArray()),
             _context.MkAnd(_variables.Skip(40).Take(10).ToArray()),
             _context.MkXor(_variables.Skip(50).Take(10).ToArray()),
             _context.MkOr(_variables.Skip(60).Take(10).ToArray()),
             _context.MkAnd(_variables.Skip(70).Take(10).ToArray()),
             _context.MkXor(_variables.Skip(80).Take(10).ToArray()),
             _context.MkOr(_variables.Take(90).Take(10).ToArray()),
         };
         _solver.Add(_constraints);
     }

     public void SearchAllSolutions()
     {
         var tempVariables = new List<IDisposable>();
         try
         {
             _solver.Push();
             var i = 0;

             while (i++ < _maxSolutionCount && _solver.Check() == Status.SATISFIABLE)
             {
                 var currentModel = _solver.Model;
                 var expressions = new List<BoolExpr>();
                 foreach (var variable in _variables)
                 {
                     var evaluationResult = currentModel.Eval(variable, true);
                     var equality = _context.MkEq(variable, evaluationResult);
                     expressions.Add(equality);
                     tempVariables.AddRange(new[] { evaluationResult, equality });
                 }

                 var negatedSolution = _context.MkNot(_context.MkAnd(expressions));
                 _solver.Add(negatedSolution);

                 tempVariables.Add(currentModel);
                 tempVariables.Add(negatedSolution);
             }
         }
         finally
         {
             foreach (var var in tempVariables)
             {
                 var.Dispose();
             }
             _solver.Pop();
         }
     }

     public void Dispose()
     {
         foreach (var boolExpr in _constraints.Concat(_variables))
         {
             boolExpr.Dispose();
         }
         _solver.Reset();
         _solver.Dispose();
         _context.Dispose();

         GC.SuppressFinalize(this);
     }
 }

Attaching a memory profiler shows the following picture:

This corresponds with the experiences in my scenario where every some days the memory increases around 1 GB (I am having bigger CSP's in my scenario than in this example).

I am also not sure whether the problem is in the .Net bindings. In order to eliminate a possible problem with the current implementation that relies on finalizers to be run correctly I implemented a custom version of the .Net bindings that ensure that when disposing of the Context happens, all Z3Objects are disposed. (see also https://github.com/ptr1120/z3/commit/b81b5a1569a43e3489c707696eadfda790e62311) (Basically each Z3Object registers it's Dispose() method on a CancellationToken from the context and the Context cancels this on it's Dispose() run) Using this implementation of the bindings improves the situation by around 50% but is still far away from an acceptable situation. Picture of memory profiler after CancelationToken based disposing approach:

After I have eliminated this possible problem I am not sure what to do next in order to get control over the increasing memory consumption behavior of the solver. Are there any ideas or workarounds that I can try?

Thanks in advance

NikolajBjorner commented 3 years ago

It is possible to use VS memory profiling for native and mixed mode. I have so far tried both mixed mode and a native mode for a redo in C++. The mixed mode debug loop is taxing on the profiler (issues with refreshing result view) and the reported leak locations don't look right. The native mode version doesn't show any leaks with my run and the heap profile doesn't seem to grow in any noticaeble way.

#include "z3++.h"
#include <thread>
#include <vector>

using namespace z3;

     expr_vector mk_vars(context& ctx, unsigned start, unsigned num_vars) {
         expr_vector vars(ctx);         
         for (unsigned i = 0; i < num_vars; ++i) 
             vars.push_back(ctx.bool_const(std::to_string(i + start).c_str()));
         return vars;
     }

void worker_thread(unsigned i) {
    context ctx;
    unsigned max_solution_count = 50;
    solver solver(ctx);
    solver.add(mk_or(mk_vars(ctx, 0, 10)));
    solver.add(mk_and(mk_vars(ctx, 10, 10)));
    solver.add(mk_xor(mk_vars(ctx, 20, 10)));
    solver.add(mk_or(mk_vars(ctx, 30, 10)));
    solver.add(mk_and(mk_vars(ctx, 40, 10)));
    solver.add(mk_xor(mk_vars(ctx, 50, 10)));
    solver.add(mk_or(mk_vars(ctx, 60, 10)));
    solver.add(mk_and(mk_vars(ctx, 70, 10)));
    solver.add(mk_xor(mk_vars(ctx, 80, 10)));
    solver.add(mk_or(mk_vars(ctx, 90, 10)));
    expr_vector vars = mk_vars(ctx, 0, 100);
    solver.push();
    for (unsigned i = 0; i < max_solution_count && check_result::sat == solver.check(); ++i) {
        auto model = solver.get_model();
        expr_vector eqs(ctx);
        for (auto v : vars) {
            auto r = model.eval(v, true);
            eqs.push_back(r == v);
        }
        solver.add(!mk_and(eqs));
    }
    solver.pop();
}

int main() {
    unsigned num_threads = 4;
    auto cycle = 0;
    while (true) {
        std::vector<std::thread> threads(num_threads);
        for (unsigned i = 0; i < num_threads; ++i) {
            threads[i] = std::thread([&, i]() { worker_thread(i); });
        }
        for (auto & th : threads) {
            th.join();
        }       
        std::cout << ++cycle << "\n"; 
    }
    return 0;
}

Now, it is true that the C# version shows some erratic growth pattern even after fixing a couple of references leaked in your code.


                 var ands = _context.MkAnd(expressions);
                 var negatedSolution = _context.MkNot(ands);
                 _solver.Add(negatedSolution);

                 tempVariables.Add(currentModel);
                 tempVariables.Add(negatedSolution);
         tempVariables.Add(ands);

The information I have so far points to one of the following:

more threads / check-sats beyond 50 are required for repro.
it reproduces only with release binaries (the mixed heap consistently points to a leak in std::unordered_map) and so far I tested only the debug native (having some linker issues for whatever reason)

NikolajBjorner commented 3 years ago

finally got release native mode working. It shows leaks where debug does not. It also agrees with mixed mode on location.

The location suggests there is something fishy with the std::unordered_map library used in vs2017 runtime library. Chances are this is a known bug, known to someone.

ptr1120 commented 3 years ago

Thank you very much for your analysis, repo in c++ and the interesting information.

Please correct me if I understood anything wrong, but as far as I understood, the problem is

not reproducible in c++
reproducible in c#, but only when native code is build in release and more than 50 threads/check-sats are involved

So following these perceptions will limit the scope for searching the source of the memory leak to the .Net bindings. Since I already eliminated the problem with the not guarateed finalizer in .Net it would be interesting if there is a suggestion for other possible sources of the problem in the .Net bindings.

Are the .Net bindings using also the C++ interface you used in your test or are they using a separate Api?

I am not sure whether any vs2017 runtime library can be related to the problem, because I did my tests also on Linux with the same results.

Just for information/clarification concerning the addressed leaks in the c# code: the temporary variables will be garbage collected in c# after leaving the scope of the method SearchAllSolution.

NikolajBjorner commented 3 years ago

It is reproducible in C++ using release builds vs2017. It is not reproducible with C++ debug build so far.

Using the C++ bindings for root-causing is much simpler than the .NET bindings: fewer dependencies, fewer places to look, simpler profiling.

It is very useful information to know it is reproducible on Linux. I have been going into the direction that it is a compiler bug/feature/move semantics.

The .NET bindings use the same basic C API as the C++ bindings. They all call into the same code. As the profiler will tell you, the memory increase is in the native code, not at the .NET level. The exact location, within std::unordered_map constructor is very weird. I thought it was a decade+ since C++ STL had miss-designed libraries (strings and xtree used to have static fields that, when the appropriate if-defs were absent, would not be thread safe and exhibit other unsafe behavior).

NikolajBjorner commented 3 years ago

Another question to consider is whether the unmanaged heap suffers from internal fragmentation. Some of the profiling, while inconclusive at the moment, suggests memory overhead comes as a side-effect of "realloc". I don't have skillsets / tool knowledge to draw any conclusions at the moment, but at some point I am likely unable to look further at this and you can consider this as one possible option.

ptr1120 commented 3 years ago

Thank you, Nikolaj. What seems strange to me is the dimension of lost memory comparing the usage of the c++ and the c# api. I ran your c++ code in Visual Studio with z3.4.8.12 using 20 solvers in parallel finding 500 solutions in 100 boolean variables and if I can trust/compare the Visual Studio Memory profiler diagram there are at most 100Mb left after running 30 cycles. If I run the c# program with the same parameters 900Mb Memory is left. Also, the memory loss using the c++ program seems not to increase during the cycles in such a significant way as in using the c# Api. Do you agree with this understanding?

Z3Prover / z3