Node process exits with code 137

malay-medianv commented 1 year ago

Hello. First of all, thanks for this library. I invoke it from my node backend to convert incoming HL7 txt files to JS objects and vice-versa. I do it exactly as shown in the example.

However, in my case there are some times 2-3 thousand ADT or SIU messages arriving in a span of seconds. At this point, the process crashes. Could you please share some insight about how to deal with this?

MatthewVita commented 1 year ago

(see last comment - all of my suggestions live there)

MatthewVita commented 1 year ago

(see last comment - all of my suggestions live there)

malay-medianv commented 1 year ago

Thank you for your direction. I'll look into each point and share the findings here.

In node, we do PM2 clustered deployment. So there is 1 master process and then 8 child processes based on how many CPU cores are there.

In your suggestion, if I am running a file watcher Chokidar, should I only run it for the

master process only?
child processes only?
both master and child processes?

MatthewVita commented 1 year ago

(see last comment - all of my suggestions live there)

malay-medianv commented 1 year ago

For the sake of simplicity - just think that I am running 1 parent and multiple child processes (# of CPU cores) for my node backend which is having API end-points and some utilities like file watcher and cron jobs. The master process picks one of the child processes in round-robin manner to serve the next API request thus balancing load.

As of now, I initiate the watcher only when I detect it is a parent/master process. Thus doing the watch trigger (not related to API work) in the parent process. The child processes do not run the watcher. They only do the API related work - serving API requests.

Now, as I was seeing the file watcher hitting its limits, I thought what if I could spawn the watcher for each child process as well. This means the watcher would fight for events on the same folder but they would be in separate processes not knowing about each other's existence. As they're watching the same folder, I will need to create some kind of logic so that only one and exactly one process's watcher acts on the actual file. E.g. file name ends with .txt1 will be acted on by child process 1, .txt2 by child process 2, and so on. You know some kind of partitioning logic that will prevent a file being acted on my more than one watcher process.

Do you think that increases my overall watching and processing capacity X fold? E.g. for 4 CPU cores, I have 4 watchers now in 4 separate child processes.

MatthewVita commented 1 year ago

(see last comment - all of my suggestions live there)

MatthewVita commented 1 year ago

Issue Closing Notes

I noticed that some of my points weren't exactly helpful/didn't match the issue's context. These notes assume that the package is in place in a large healthcare setting that processes thousands of HL7 messages each day.

Understand that you'll need to create your own fork to test out and implement these ideas. This will likely involve updating the package's constructor to take in an object containing concurrent processing configuration. Skip this logic if no such object is supplied. Please contribute back any changes others can use in a high-volume setting such as this.

Generic Solution Ideas

Use Promise Pool (https://github.com/supercharge/promise-pool) to limit how many messages can be ran at once.
Up the JVM's memory size

Ideas Pertaining to this Issue's Use-case

If Promise Pool doesn't meet performance requirements, implement a queue (Redis backed?) to handle memory elsewhere. This cannot be included in the codebase because adding Redis as a requirement is not generic enough/overkill for most users. It can be documented, however.
If using Chokidar and HL7 messages land in specific folders, "tell" Chokidar to only point at those dirs and avoid making multiple processes with PM2 as Chokidar will run into races and unexpected states. Chokidar should only ever have 1 instance.

MatthewVita / node-hl7-complete

Node process exits with code 137 #27