TomFrost / Bristol

Insanely configurable logging for Node.js
MIT License
113 stars 19 forks source link

A question about Bristol and cluster #45

Open jnmichaud opened 6 years ago

jnmichaud commented 6 years ago

I'm asking a question because I can't find any resources that speak directly to the question. I apologize for opening an issue when I don't have an actual problem. (Feel free to close this issue whenever.)

I'm using Bristol in a node app where I'm also using cluster. Because of the nature of cluster, logging has potential issues associated with it. Specifically, when a child process logs an error or event there is no way to tell what state the child process is in compared to the master process, and logs get confusing when they come out with two associated lines interspersed with others from different processes.

That leaves two options: log the event/error directly from the child process or send a message to the master then log from there.

It seems logical to me to log errors that would result in a process termination directly from the child, while sending any other important notifications to the master to log (along with a processID for the child). This has some obvious performance penalties associated with it, so I'd assume any debug logs would have to be logged directly from the child processes also.

My questions:

  1. Is there a set of best practices associated with this issue that you know of?
  2. Does Bristol do anything to change the way the stdout and stderr from cluster are piped? (I know of at least one other logging library that does intentionally change this behavior.)

Thanks!

Jared

PS: Bristol is awesome and incredibly helpful. Keep up the good work. By contrast, Winston and Bunyan are both swimming in un-handled issues and seem over-complicated and poorly maintained. I really appreciate that you've put in the time and effort to create this library.

TomFrost commented 6 years ago

Hey Jared, thanks for the kind words!

Logging with Cluster was brought up in Bristol's infancy, and handling it with an official feature was a consideration for awhile. But I've recently closed that ticket with this explanation:

I'm going to close this for now. While there's a good solution to this, it would appear to be not worth the effort at this point. There's a strong consensus that for most use cases, if you're running Node on a machine with multiple CPUs, you should either:

  • (if you're on a cloud provider) spend the same amount of money to turn that machine into multiple machines with single CPUs and run multiple instances of your app, or:
  • (if you're on discrete hardware) use a hypervisor or container scheduler to run multiple instances of your app on that machine as though they are individualized machines.

With the above, you multiply your durability for literally free, and gain the advantage of a less complex codebase as a result.

With that said, In the recent past I created a node-based ETL solution that forked off a child process for each data pipeline it ran. My logging solution was the IPC channel option you mentioned. I loaded Bristol and overrode log.log with a function that would pass the data through the IPC and the master would receive it and pass the arguments to an actual instance's log.log.

Apart from that, though, the problem with logging fatal errors in node is that stdout is the only logging solution that will block the failing process from dying until the messages are written. So if you're concerned about process termination, I'd recommend starting child processes with stdio: 'inherit' (or use the silent flag for the cluster module) so any stdout goes to the parent, then just fire up Bristol with the console target. Of course, this is only convenient if the parent also sends its logs to the console.

So, to answer your questions!

  1. I'm afraid this concern hasn't been shared enough for best practices to win out over the other options, but I hope I've been able to offer some ideas here. Definitely, definitely consider eschewing the cluster module for actual distribution of your app, though. Whether it's multiple VMs, multiple docker containers, multiple compute resources on a cloud platform, whatever, it solves your logging problem (as long as you have some distributed logging collation solution in place... ELK, Splunk, Loggly, whatever), and offers a huge benefit to your app in the form of durability. Of course, this assumes your use case allows for it.
  2. Bristol absolutely does not and will never (famous last words) manipulate the default operation of the standard streams.

Let me know if this is (un)helpful, or if you have any other feedback on your use case! I'm happy to bat more ideas around :)

jnmichaud commented 6 years ago

Tom,

I really appreciate the friendly response. I feel like I'm intruding whenever I ask questions because I'm still so new to this and prone to ask stupid questions. Along those lines, I'm afraid I only half understand some of what you said. I have quite a bit of experience actually making node.js "do work" at this point, but very little with terminology, best practices, standards, and other peoples' modules. (A couple years ago I knew nothing at all about coding, so my understanding of context is limited.) That said, I can bring this discussion to a focus pretty quickly.

Currently I AM on discrete hardware, but I could easily change that without too much hassle.

The key point is this: I'm writing a web game that REQUIRES a master thread to make certain key decisions in real time (to referee between workers), which makes my use case for cluster about perfect--most of the processing and request handling is done in the worker threads and the master handles only very specific decisions that could cause conflict between game clients.

For that reason, I already have a messaging system between the worker and master threads developed and working happily away. (Note that I've been using console log and carrying messages to the master up until now, except for debugging "console.logs" which I just put in and took out as I went.)

I'm finally refactoring my server for multiple reasons. Among those is to use 'real' logging.

So far, I've basically just created separate instances of Bristol for each worker process, all of which output to the standard stdout, which is handled in the standard way. I'm willing to put up with the interweaving logs and just insert a processID in case I have to backtrack--it hasn't caused me any problems so far--but if you think there might be a better way to handle it, I'm glad to play with it.

The more I think about it, the more I dislike passing messages to the master unless I have no choice--if I end up scaling this up, the performance hit is going to hurt me. Everything is sending logs to the same place (and will probably continue to do so--before my scaling got so big I'd need to change that, I'd end up spinning off a new game server).

So with all that in mind, is there anything I'm missing, or should I just continue to put up with the interweaved logs?