axiomhq / axiom-js

Official language bindings and library extensions for Axiom
https://axiom.co
MIT License
87 stars 14 forks source link

Internal errors in @axiomhq/pino cause server to crash #140

Open charles-rc opened 6 months ago

charles-rc commented 6 months ago

I think this is a similar issue to #131.

We currently use @axiomhq/pino in our Koa Node servers. Our logging (and our servers!) went down for an hour. We have Sentry errors during this time for errors internal to Axiom - Error FetchClient.<anonymous>(@axiomhq.js.dist.cjs:fetchClient) Service Unavailable.

It seems that either Axiom ingestion (via the api.axiom.co endpoint) went down for an hour or potentially our network connectivity in AWS ECS dropped.

It appears that thrown (network) errors during axiom.ingest aren't caught properly. You can reproduce this by logging with an instance of @axiomhq/pino with your internet connection disabled. The error is thrown at the root of the program so it halts execution.

I tried using the hooks functionality of pino to put a wrapper around the logs but it appears I cannot await ingest() as its promise is not returned by the log function.

I would love your advice on how to resolve this. I think either better error handling inside @axiomhq/pino or some callback we can subscribe to would be best.

I guess I could also try wrapping the instantiation of the logger in a function to attempt to catch errors there rather than at the script level?

This is obviously a bit concerning that a network error can bring down our server. Other libraries like ioredis have an onError callback that will be invoked if network connectivity drops, but it doesn't crash the server.

Thanks in advance for all your help!

AkMo3 commented 6 months ago

It would be appreciable to have a rejection or onError handler. Otherwise, it is a critical problem which restricts anyone from using this in production, which can bring down production and there is no way you can handle it. Hope this gets implemented ASAP.

Iznogohul commented 6 months ago

@charles-rc I had a very similar problem-incident while using @axiomhq/js and also our node servers went down. It would be nice to have the reason behind a failed ingest exposed as if you have internet but defined an api token that has expired or is invalid it does the same thing because you are unauthorized. The only way I found that it can be solved for @axiomhq/js is to try catch the ingest and await the flush for each of them.

try{
  // Data ingestion as mentioned in their @axiomhq/js readme
  axiom.ingest(dataset, [{ foo: 'bar' }]); 
  await axiom.flush()
}catch(err){
  console.log(err)
}
schehata commented 6 months ago

I am working on handling those exception now, finalizing the PR.

schehata commented 6 months ago

we have a released a new version of axiom-js and axiom-pino that would solve the exceptions handling in axiom-js. on Monday we will continue investigating Pino package and add some tests to ensure that doesn't happen again. let us know if this works our for you, and if there is any thing we can improve on the new onError() method.

charles-rc commented 6 months ago

@schehata thanks for doing that!

Unfortunately, I am getting this error when attempting to pass in a custom onError callback to @axiomhq/pino:

node:internal/worker:236
    this[kPort].postMessage({
                ^
DOMException [DataCloneError]: error => {
      if ((0, _utils.shouldLogToSentry)()) {
        Sentry.captureException(error);
      } else {
...<omitted>... } could not be cloned.
    at new DOMException (node:internal/per_context/domexception:53:5)
    at new Worker (node:internal/worker:236:17)
    at createWorker (/app/node_modules/thread-stream/index.js:52:18)
    at new ThreadStream (/app/node_modules/thread-stream/index.js:224:19)
    at buildStream (/app/node_modules/pino/lib/transport.js:21:18)
    at transport (/app/node_modules/pino/lib/transport.js:114:10)
    at normalizeArgs (/app/node_modules/pino/lib/tools.js:316:16)
    at pino (/app/node_modules/pino/pino.js:90:28)
    at wrapChild (/app/node_modules/pino-http/logger.js:226:14)
    at pinoLogger (/app/node_modules/pino-http/logger.js:81:18)

I think it's to do with how pino runs inside workers. I think the current bump should fix the problem with our servers crashing as I believe you now default to catching and then console.erroring the caught exception?

However, it doesn't seem like I can pass in a custom handler to pass the error onto Sentry or the like.

Am I maybe doing something wrong?

schehata commented 5 months ago

the onError handler is for js SDK not pino. I have to check pino and find a way to pass an onError handler to the logger. will get back to you on this.

schehata commented 5 months ago

looks like, doing this won't be straight forward. I think the problem is that the worker can't pass a function in postMessage and that's why passing onError to options is failing.

There is another way but that will need some work, I am wondering if its worth it? or the current console.log would be enough?