Closed clausreinke closed 4 years ago
Yes, that is exactly the problem found with the current implementation of the pool. The threads spawned by express can get the same handle X from the pool. The pool must be protected against that, by moving the pool functionality to C and synchronising with existing mutex, or using new pool mutex. The existing one should be enough I think. For the time being better not use pool, with express or similar multi-thread frameworks.
I believe I am experiencing the exact same issue with my node-red-contrib-saprfc node. I reported this before, but closed it because I thought I had solved it by using the connection pool. It seems I was wrong.
The way my code works, I setup a connection pool, and I create an async
task queue. Then I add "tasks" to this queue, where each task is a call to a node-rfc
.
The queue has a processor that will execute each item in the queue. The number of parallel tasks that can be processed is configurable.
When pushing 1000 BAPI_MATERIAL_SAVEDATA
tasks to the queue, with a parallel processing limit of 4, a seemingly random number of tasks will process successfully before I get a segmentation fault.
When I reduce the parallel processing limit to 1, it seems that all tasks complete successfully (albeit extremely slowly). I say "seems", because I have also had RFC_INVALID_HANDLE
errors pop up in this way too and I have to do further testing to try and figure out whats going on.
I presently cannot think of any way to change the behavior of my program to prevent the segmentation faults.
Incidentally, I was trying to setup node's diagnostic report to try and get more details on the seg fault, but for some reason no report is generated. I'm starting node with --report-uncaught-exception --report-on-fatalerror --report-filename=./node-report.json --report-directory=/data
My theory is that the 1:1 relation between Client
s and connectionHandle
s gets messed up (by an explicit or implicit close), at which point the Client
mutexes no longer protect against multiple threads using the same connectionHandle
.
To test this theory, we have a patched version of node-rfc
where we
Client
destructor if the Client
may have lost exclusivity for that handleClient->alive
to more accurately track the handle status (reduce the time windows where multiple threads might see incorrect values)Good news first: In the two days, we've tested that patched version, we've seen no more invalid handles #128 or communication failures #147, so there is something to the theory.
More details:
We've only tested for two days so far, and while the patch may be overly paranoid in some places, it leaves gaps in other places. So I have not yet made a PR out of it, but if you want to test it early yourself, it is https://github.com/tangro/node-rfc/tree/trackHandleStatus . You'll have to make sure to bypass the prebuild-install
or you'll not be running the patched version - ok that alone is perhaps enough reason to make a PR (even if the patch is incomplete, it does seem to be an improvement).
Suprising news In these two days, we've also not seen any more silent crashes, though I do not yet have a theory why that might be.
For our windows servers, they did not represent as Segmentation faults, but as fast exits with code C000 0409
. According to https://devblogs.microsoft.com/oldnewthing/20190108-00/?p=100655 , it is no longer certain that this means a STATUS_STACK_BUFFER_OVERRUN
. However, we've also been unable to catch or predict them and there has not been sufficient info, let alone reproducability, to open a separate issue even though these crashes have been annoying.
What little I can say from the RFC traces (in the couple of cases where the client or server trace was on at the time and I've been able to locate the incident time) is that these crashes happen during client.invoke
: the client successfully connects, sends parameters and even receives the first block of result data, then it disappears without further trace.
@clausreinke I got your PR to compile and tried testing it today.. however as soon as my program tries to pool.aquire, it crashes.
EDIT: Ignore this. There was a bug in my own code causing the problem. I found & corrected it, running tests now...
EDIT: Ignore this
Still having the same problem.
[sapRFC:call] {
alive: true,
name: 'RfcLibError',
code: 13,
codeString: 'RFC_INVALID_HANDLE',
key: 'RFC_INVALID_HANDLE',
message: "An invalid handle 'RFC_CONNECTION_HANDLE' was passed to the API call"
}
double free or corruption (out)
Aborted
npm ERR! code ELIFECYCLE
I have to retract my previous post.
I installed the PR version of node-rfc globally and then installed my app. For some reason, npm ignored the global version of node-rfc and instead installed a local version of 1.2.0 from the registry.
After manually deleting the local version, my app ran with the PR149 version and my program completed without errors.
Nice work @clausreinke.
@bsrdjan I would love to see this PR merged.
EDIT: For what it's worth, I called BAPI_MATERIAL_SAVEDATA
2000 times using an async queue with 4 parallel tasks running at a time.
Merged, thanks a lot @clausreinke for this contribution.
@bsrdjan When should we expect a new release with this patch included?
Just published, feel free to test.
Just checking the status here. Any updates in the meantime, any issues left ?
Let us close and reopen if needed.
We keep seeing a noticable number of
RFC_COMMUNICATION_FAILURE
. I mentioned this previously in #128, but I want to explore the possibility that it is a separate issue, occuring independently of the invalid handles.There is a fairly stable pattern that looks somewhat like this
In explanation, this looks as if
It took me a while to notice that this it just what we see from within the Javascript callbacks, which get called from within the Napi callbacks, so there are gaps in this picture.
Could you please check whether the following scenario makes sense?
RfcCloseConnection
on handle X again, becauseRfcIsConnectionHandleValid
returns that handle X is valid (again, because of 4)In other words, two clients are working on the same handle, in spite of the semaphores.