SAP / node-rfc

Asynchronous, non-blocking SAP NW RFC SDK bindings for Node.js
Apache License 2.0
251 stars 73 forks source link

node-rfc memory leak? #41

Closed doronoded closed 6 years ago

doronoded commented 6 years ago

We have encountered very high memory consumption in our node process after some time the node-rfc was running. We performed a small test (see below) trying to isolate only node-rfc's memory consumption change over time. The test invoke the BAPI_PO_GETITEMSREL function module 500 times sequentially, each time we write the node process memory to a text file. Results show a constant increase in the node rss memory, starting from ~33MB, ending at ~100MB.

var rfc = require ("node-rfc");

var abapSystem = {
    "user": "fill in yours",
    "passwd": "hidden",
    "ashost": "hidden",
    "sysnr": "hidden",
    "client": "hidden"
}

var fs = require ("fs");
var fileName = "withClose.txt";
fs.writeFile(fileName, `iteration\t\trss\t\theapTotal\t\theapUsed\t\texternal\n`)
function run (i) {

    console.log(i);
    if (i == 500) return;

    var client = new rfc.Client(abapSystem);

    client.connect(function (err) {
        if (err) { // check for login/connection errors
            return console.error('could not connect to server', err);
        }

        client.invoke("BAPI_PO_GETITEMSREL", {"REL_GROUP":"02","REL_CODE":"A1"}, function (err, res) {
            var memUsage = process.memoryUsage();
            var line = `${i}\t\t${(memUsage.rss / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapTotal / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapUsed / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.external / 1024 / 1024).toFixed(3)}MB\t\t${err ? err : ""}\n`;
            fs.appendFileSync(fileName, line);
            client.close();
            setTimeout(() => run(++i), 0);
        });
    });
}

run(1);
bsrdjan commented 6 years ago

I could not reproduce 33 to 100 MB increase, using the latest node-rfc, nodejs 6.12.0, on Windows 7 Ultimate 64 bit: withClose.txt

Are you using the latest node-rfc? Which nodejs release, on which platform?

doronoded commented 6 years ago

Wow, how weird.. I just did it again to make sure, and it reached 126MB. I am using node-rfc 0.1.13. In an Ubuntu subsystem (for windows). Node: v8.9.4 Windows 10 Home 64bit. withClose.txt

bsrdjan commented 6 years ago

Thanks. To be on a safe side regarding the environment, could you please also post the nwrfc lib version (rfc.getVersion() output) ?

doronoded commented 6 years ago

rfc version output: 7210,0,42

bsrdjan commented 6 years ago

I tested on Ubuntu Linux 64 bit, with NWRFC lib [ 7470, 0, 0 ], nodejs 6 and 8 and could not reproduce the issue. Traces attached:

withClose6.txt

withClose8.txt

If memory leak suspected with SAP NW RFC library 7.20, please create customer message in respective SAP component.

livnoni commented 6 years ago

I'an getting the same result like doronoded , i also has memory leak: I tested it on Ubuntu Linux 64 bit, with the latest NWRFC lib [ 7500, 0, 0 ], nodejs version 8.9.1, and the memory reached to 132.918MB here you can see the result: withClose.txt In addition , I am getting error ('error 13') for using client.close(), this what i get: Error closing connection: 13 maybe this is the reason to the memory leak? maybe the client doesn't deleted?

bsrdjan commented 6 years ago

@livnoni, Ubuntu Linux is real Linux, or subsystem on Windows (or docker, VM image ...) ?

I could not reproduce the leak or connection close. Here are my test configuration

  1. VMWare Ubuntu 17.10 64 bit image
  2. Fresh cloned and compiled node-rfc
  3. nodejs versions 6.12.0 and 8.9.1
  4. NW RFC SDK versions 7.40 and 7.50

and respective traces:

withClose-node6.12-7470.txt withClose-node-6.12-7500.txt withClose-node8.9.1-7500.txt

@doronoded , I never tested on Ubuntu subsystem for Windows, could it be that issues occurs on that platform only?

Did you build node-rfc from source on test system?

doronoded commented 6 years ago

Yes, I have built it from scratch on a linux subsystem on windows But @livoni did the test on a linux system (not a subsystem).

Is there another way to check why we get an error when trying to close the connection?

בתאריך יום ו׳, 16 במרץ 2018, 16:24, מאת Srdjan Boskovic ‏< notifications@github.com>:

@livnoni https://github.com/livnoni, Ubuntu Linux is real Linux, or subsystem on Windows (or docker, VM image ...) ?

I could not reproduce the leak or connection close. Here are my test configuration

  1. VMWare Ubuntu 17.10 64 bit image
  2. Fresh cloned and compiled node-rfc
  3. nodejs versions 6.12.0 and 8.9.1
  4. NW RFC SDK versions 7.40 and 7.50

and respective traces:

withClose-node6.12-7470.txt https://github.com/SAP/node-rfc/files/1819470/withClose-node6.12-7470.txt withClose-node-6.12-7500.txt https://github.com/SAP/node-rfc/files/1819469/withClose-node-6.12-7500.txt withClose-node8.9.1-7500.txt https://github.com/SAP/node-rfc/files/1819471/withClose-node8.9.1-7500.txt

@doronoded https://github.com/doronoded , I never tested on Ubuntu subsystem for Windows, could it be that issues occurs on that platform only?

Did you build node-rfc from source on test system?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SAP/node-rfc/issues/41#issuecomment-373729083, or mute the thread https://github.com/notifications/unsubscribe-auth/ADP71UNsd2GS6iA81AlZc1iWr_Gh4IJtks5te8s0gaJpZM4SG648 .

bsrdjan commented 6 years ago

Error codes are enumerated in sapnwrfc.h, in include subdirectory of sap nw rfc sdk library. The 13 stands for:

RFC_INVALID_HANDLE,         ///< An invalid handle was passed to an API call

Possible cause could be calling Invoke method for the connection which has been closed in the meantime. What are you trying to achieve, what task should this example do ? Perhaps it can be restructured a bit (check error codes in any case) ?

livnoni commented 6 years ago

@bsrdjan maybe it happens because of our sap specific system? If do, we would like to give you our credentials for you test it in your P.C...

bsrdjan commented 6 years ago

I suppose the purpose of the test is to check if sequential execution of node/RFC calls eventually causes memory leak? Is that correct?

The 'invalid handle' error 13 happens when SAP NW RFC SDK internal method is called, which is not allowed/expected because of the RFC connection internal state. This may happen when multi-threaded application is not properly handling parallel RFC connection instances but can be also reproduced fairly simple, by calling methods of not initialised connection:

rfc = require('../build/rfc/rfc.node');
client = new rfc.Client({user:xxx ...});
client.close() // c.invoke ...
{ Error: An invalid handle was passed to the API call ..

I suppose exactly this happens with your test script, on certain platforms, when setTimeout and garbage collection in recursion do not behave as expected.

You could try using non-zero timeout, or eventually use events, r3connect, or simply keep client pointers separated, by "brute force", something like this:

  var fs = require ("fs");
  var fileName = `withClose-${process.version}.txt`;
  fs.writeFile(fileName, `iteration\t\trss\t\theapTotal\t\theapUsed\t\texternal\n`)

  var connections = [];

  function run (i) {

      console.log(i);

      if (i == 500) {
        console.log(connections.length, connections.every(v => v.isAlive() == false));
        return;
      }

      var client = new rfc.Client(connParams);
      connections[i] = client;

      client.connect(function (err) {
              if (err) { // check for login/connection errors
                          return console.error('could not connect to server', err);
                      }

              client.invoke("BAPI_PO_GETITEMSREL", {"REL_GROUP":"02","REL_CODE":"A1"}, function (err, res) {
                if (err) {
                  console.log(err)
                  fs.appendFileSync(fileName, err);
                } else {
                          var memUsage = process.memoryUsage();
                          var line = `${i}\t\t${(memUsage.rss / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapTotal / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapUsed / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.external / 1024 / 1024).toFixed(3)}MB\t\t${err ? err : ""}\n`;
                          connections[i].close();
                          fs.appendFile(fileName, line);
                          run(++i);
                }
              });
          });
  }

  run(1);

Each new client instance is stored in connections array, before opening connection and exactly that instance is used for closing connection: connections[i].close(). Async file write should not be required either.

Could you please check if this script fixes the "error 13" ? Once this error is fixed, the initially experienced memory leak should not happen either.

livnoni commented 6 years ago

I executed your code... and here the result: withClose-v8.9.1.txt As you see, we still have memory leak, but the client.close() error fixed. After making a few checks in our side, we discovered that the memory leak reproduce only when we execute "heavy" bapi quires, For example: when we use client.invoke("STFC_CONNECTION", {REQUTEXT: 'hello SAP world!'} there are no memory leak. It seems that in our case, using the BAPI_PO_GETITEMSREL on specific manager ({"REL_GROUP":"02","REL_CODE":"A1"}) return a big object value.

We think that the memory leak happens when we use a lot of "heavy" queries like BAPI_PO_GETITEMSREL. You can't reproduce it because when you run it on your system, the quires doesn't return a "big" data. Let me know if you want to run our test with our credentials, for view the the memory leak.

Thanks

bsrdjan commented 6 years ago

But what is the purpose of this test script? To retrieve more backend data sequentially, or test if memory leak can happen in SAP NW RFC lib or node-rfc?

In either case, 500 loops of recursion, retrieving each time more and more data, are expected to steadily increase memory consumption, at least until garbage collection starts.

Instead of recursion, why not try using events for example, something like:

const events = require('events');
const eventEmitter = new events.EventEmitter();

var i = 1

function run() {
      if (i++ == 500) // exit ...

      client.invoke ...{
          if (err) { // ...

          } else {

              eventEmitter.emit('next');
          }
      });
}

eventEmitter.on('next', run);
eventEmitter.emit('next');

Could you please try this or similar non-recursive approach and If memory consumption still observed, please post the test script, result trace and platform details (real/virtual OS, OS release, sap nw rfc version, nodejs version).

fujifish commented 6 years ago

We found the memory leak in the file rfcio.cc. Essentially, every string created by wrapString function never gets released. The fix is to free the allocated string. We've tested the fix in our environment and I can confirm it works as expected.

Take a look at https://github.com/capriza/node-rfc/commit/fbc5f7b6f6bdf086b493c8960530aad20a4c9bfb

spencer-uk commented 6 years ago

Hi Nadav

AFAIK, looks like a very nice catch :) Maybe it's worth to extend the fix to line 265 (https://github.com/capriza/node-rfc/blob/fbc5f7b6f6bdf086b493c8960530aad20a4c9bfb/src/rfcio.cc#L265), before returning the error, memory needs to be released as well? cheers

On 01/04/2018, Nadav Fischer notifications@github.com wrote:

We found the memory leak in the file rfcio.cc. Essentially, every string created by wrapString function never gets released. The fix is to free the allocated string. We've tested the fix in our environment and I can confirm it works as expected.

Take a look at https://github.com/capriza/node-rfc/commit/fbc5f7b6f6bdf086b493c8960530aad20a4c9bfb

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/SAP/node-rfc/issues/41#issuecomment-377782941

bsrdjan commented 6 years ago

@fujifish, looking into proposed change, I don't expect any difference in memory consumption. The memory of one pointer is freed (utf8) and then immediately allocated for another one (resultValue). The garbage collector takes care about both anyway and I don't see the improvement. Please see What REALLY happens when you don't free after malloc?.

The immediate free here would be of course beneficial but tricky do achieve without memory duplication. It must also work fast because fill/wrapString are very critical for the overall node-rfc performance. If you find the solution I would be very glad to test.

Could you please share which problem exactly the proposed change solved in your environment and the test script to reproduce?

fujifish commented 6 years ago

@bsrdjan, as I mentioned we validated that indeed this fix resolves the memory leak we have been seeing. The stack overflow discussion you mentioned mainly refers to freeing memory right before a program exists. This is not the case for a long running process such as ours where we make many calls over a long period of time without existing.

Allocating memory requires freeing it, and to the most part it is up to the allocator to also free the allocated memory. When it's not the case, the documentation usually specifies it (see example at https://github.com/nodejs/nan/blob/master/doc/buffers.md#nannewbuffer).

In the case of Nan::New of a string (https://github.com/nodejs/nan/blob/master/doc/new.md#nannew) it does not mention taking ownership of the memory, so it remains up to us to free the memory.

Furthermore, taking a look at the V8 documentation NewFromUtf8 for allocating a string (https://v8.paulfryzel.com/docs/master/classv8_1_1_string.html#a745e88987f6d7e01cb82e12ab9fc8703) it indicates allocating memory for the string - this means that the original char* used as the source for the string can, and should be freed.

bsrdjan commented 6 years ago

In a long running process, the garbage collector recycles the memory as well. That was the reason for not explicitly freeing up the memory, as suggested by proposed change (has been discussed before).

How long the process is running and could you please share the test script showing the difference?

fujifish commented 6 years ago

The garbage collector will never free the allocated memory of that string - V8 is not the owner of that memory, hence we need to free it ourselves.

The process may potentially live for days or weeks (it's a server). The script in hand is the same one that @doronoded and @livnoni mentioned previsously and which you yourself have tested.

Perhaps you can expose wrapString directly to javascript and create a very simple script to reproduce the memory leak, something like:

for (let i = 0; i < 500; i++) {
  client.wrapString(`this is string ${i}`);
  let memUsage = process.memoryUsage();
  let line = `${i}\t\t${(memUsage.rss / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapTotal / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapUsed / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.external / 1024 / 1024).toFixed(3)}MB\n`;
  console.log(line);
}
bsrdjan commented 6 years ago

I finally reproduced using original test script and STFC_PERFORMANCE RFC:

let fs = require ("fs");
let fileName = `withClose-${process.version}.txt`;
fs.writeFile(fileName, `iteration\t\trss\t\theapTotal\t\theapUsed\t\texternal\n`)

let connections = [];

function run (i) {

    console.log(i);

    if (i == 500) {
      console.log(connections.length, connections.every(v => v.isAlive() == false));
      return;
    }

    let client = new rfc.Client(connParams);
    connections[i] = client;

    client.connect(function (err) {
            if (err) { // check for login/connection errors
                        return console.error('could not connect to server', err);
                    }

            client.invoke("STFC_PERFORMANCE", {"CHECKTAB":"X","LGET0332":"999", "LGET1000": "999"}, function (err, res) {
              if (err) {
                console.log(err)
                fs.appendFileSync(fileName, err);
              } else {
                        let memUsage = process.memoryUsage();
                        let line = `${i}\t\t${(memUsage.rss / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapTotal / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapUsed / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.external / 1024 / 1024).toFixed(3)}MB\t\t${err ? err : ""}\n`;
                        connections[i].close();
                        fs.appendFile(fileName, line);
                        run(++i);
                        delete res;
              }
            });
        });
}

run(1);

The traces before (withClose-v8.10.0-leak.txt) and after proposed fix (withClose-v8.10.0-fix.txt) clearly show the leak and that my assumption was wrong.

Thank you very much for this good catch, in one of the most critical functions! Do you want to create the PR ? Missing free at line 265 is welcome as well, although not so harmful because this error should never happen in node-rfc.

As a side note, in Python RFC connector the memory is released in try/finally section, avoiding temporary duplication of potentially large strings. Even if it would work in node-rfc, the try/catch can be expensive because (at least until version 6) sections containing try/catch were excluded from compiler optimisation.

bsrdjan commented 6 years ago

Fix released in 0.1.14

Berdmanfolk commented 6 years ago

hi, bsrdjan! I use another Function module and for this one the memory leak is not fixed. Please try test script with this parameters:


let fs = require ("fs");
let fileName = `withClose-${process.version}.txt`;
fs.writeFile(fileName, `iteration\t\trss\t\theapTotal\t\theapUsed\t\texternal\n`)

let connections = [];

function run (i) {

    console.log(i);

    if (i == 500) {
      console.log(connections.length, connections.every(v => v.isAlive() == false));
      return;
    }

    let client = new rfc.Client(connParams);
    connections[i] = client;

    client.connect(function (err) {
            if (err) { // check for login/connection errors
                        return console.error('could not connect to server', err);
                    }
              client.invoke('SWNC_READ_SNAPSHOT', {READ_TIMEZONE: 'RUS03', READ_START_DATE: '20180418', READ_START_TIME: '080000', READ_END_DATE: '20180418', READ_END_TIME: '120000', TIME_RESOLUTION: 15}, function (err, res) {
            //client.invoke("STFC_PERFORMANCE", {"CHECKTAB":"X","LGET0332":"999", "LGET1000": "999"}, function (err, res) {
              if (err) {
                console.log(err)
                fs.appendFileSync(fileName, err);
              } else {
                        let memUsage = process.memoryUsage();
                        let line = `${i}\t\t${(memUsage.rss / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapTotal / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.heapUsed / 1024 / 1024).toFixed(3)}MB\t\t${(memUsage.external / 1024 / 1024).toFixed(3)}MB\t\t${err ? err : ""}\n`;
                        connections[i].close();
                        fs.appendFile(fileName, line);
                        run(++i);
                        delete res;
              }
            });
        });
}

run(1);

Trace attached: withClose-v6.11.3 heavy FM.txt

bsrdjan commented 6 years ago

Hi @Berdmanfolk, could you please post the trace as well?

bsrdjan commented 6 years ago

Thank you very much for this RFC, excellent for performance and mem leaks testing !

I changed the time zone and start/end date, retrieving ca. 10k records in each call and could not reproduce on Linux. I tried with node 6 and 8 and although not exactly measured, node 8 looks noticeably faster to me.

Please find traces attached. xxx-withClose-v6.13.1.txt xxx-withClose-v8.11.1.txt

Are you sure you are using the latest node-rfc version? I created test folder, npm installed node-rfc and ran the script.

Berdmanfolk commented 6 years ago

Hi @bsrdjan, The problem was resolved. The reason was that I used the old SAPNWRFC library version 720. After upgrading to version 750, the memory leak stopped.

Thanks so much.

bsrdjan commented 6 years ago

Glad it works now.