elastic / apm-agent-nodejs

https://www.elastic.co/guide/en/apm/agent/nodejs/current/index.html
BSD 2-Clause "Simplified" License
584 stars 225 forks source link

is apm.currentTransaction set inside a hapi `server.events.on('response'` event? #2353

Open trentm opened 3 years ago

trentm commented 3 years ago

In discussion with @mshustov working on https://github.com/elastic/kibana/issues/102699 the apm.currentTraceIds inside a hapi server 'response' event are empty. Has the transaction already ended? Or is the run context just not bound to the 'response' server event. From a glance at the hapi instrumentation, I'm thinking the latter, but I need to play a bit to confirm.

If that is the case can we and do we want the transaction run context to be bound to the server 'response' event? How about to other core server events?

trentm commented 3 years ago

From playing a little bit, the HTTP response has ended (and the APM instrumentation of core http has ended the transaction) before hapi sends the 'response' event. I think it would incorrect to bind "response" event handlers to the transaction: while it might help give the user access to apm.currentTransaction, that transaction has ended so will already have been encoded and on its way to APM server so much of the transaction API will be unusable. However, I am not totally opposed to the idea of binding the "response" event handlers to the run context.

How about using the 'onPreResponse' hapi request lifecycle extension point, e.g. https://gist.github.com/trentm/924b0ec426b6c81a3e76fab04f846655#file-trace-hapi-js-L30-L38 The HTTP response has not ended, the transaction context is current so apm.currentTraceIds works.

  // ...

  // onPreResponse runs in the context of the APM transaction, so
  // `apm.currentTraceIds` et al are available.
  // For kibana, might need to make sure this runs after the
  // `adoptToHapiOnPreResponseFormat` handler also added to 'onPreResponse'.
  server.ext('onPreResponse', (request, h) => {
    console.log('onPreResponse: traceIds=%j requestId=%s active=%s',
      apm.currentTraceIds, request.info.id, request.active())
    return h.continue
  })
trentm commented 3 years ago

To answer the question posed by the issue title: No, the APM transaction for the request/response being reported in the hapi 'response' event is not the "apm.currentTransaction" in the response event handler. This means that one cannot get trace.id-based log correlation from logging in the 'response' event handler.

We could consider binding any 'response' event handler to this transaction run context. However there would be some caveats. By the time the 'response' hapi event is emitted, the HTTP response has ended and the APM transaction has ended. This means that only the readable parts of the Transaction API off of apm.currentTransaction.* could be used.

An answer to the above, as mentioned, is for the user to instead use server.ext('onPreResponse', ... to be able to access the currentTransaction while it is still active. However, that is possibly limiting because the user code then may need to watch for the order of 'onPreResponse' handlers if their application has multiple, because the response can be mutated.

Unless there is a specific user case made for needing the hapi server 'response' event handler bound to the transaction run context (i.e. using 'onPreResponse' instead is not feasible), I'm inclined to not add this functionality. I'm happy to reconsider (with a new issue or re-opening this one). @mshustov Please let me know if you disagree.

mshustov commented 3 years ago

I finally got to the https://github.com/elastic/kibana/issues/114476 task.

An answer to the above, as mentioned, is for the user to instead use server.ext('onPreResponse', ... to be able to access the currentTransaction while it is still active. However, that is possibly limiting because the user code then may need to watch for the order of 'onPreResponse' handlers if their application has multiple, because the response can be mutated.

Kibana Core controls access to Hapi AP, therefore it can enforce a custom onPreResponse hook to be executed after all the response mutations. However, using the onPreResponse lifecycle for response logging can lead to hard-to-debug problems. Let's look at the Hapi code https://github.com/hapijs/hapi/blob/e756038971f406aa748d950144444416d9854dd9/lib/request.js#L449-L459 onPreResponse lifecycle is invoked during _postCycle. Then the request is transmitted and the response event is emitted during _finalize. If an exception is raised during Transmit.send, the response object will be changed https://github.com/hapijs/hapi/blob/e756038971f406aa748d950144444416d9854dd9/lib/transmit.js#L17-L35 Thus, a response object in the logs emitted during onPreResponse lifecycle is different from the error object sent in the response. Also, the response time logged during the onPreResponse lifecycle doesn't include time spent in Transmit.send.

the APM transaction for the request/response being reported in the hapi 'response' event is not the "apm.currentTransaction" in the response event handler.

I don't say 'response' event handler must be included in the current transaction but from the application point of view, an incoming request unit of work spans the request lifecycles https://hapi.dev/api?v=20.2.0#request-lifecycle including onPostResponse executed after 'response' event.

Btw open-telemetry does instrument lifecycle extension points: This package provides automatic tracing for hapi server routes and request lifecycle extensions defined either directly or via a Hapi plugin. https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/plugins/node/opentelemetry-instrumentation-hapi#hapi-instrumentation-support https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/plugins/node/opentelemetry-instrumentation-hapi/src/instrumentation.ts#L113 Unfortunately, I didn't manage to test it locally due to a problem with my local environment.

We can work the problem around on the Kibana level. What if I store trace.id during onPreResponse and use it later?

server!.ext('onPreResponse', (request, responseToolkit) => {
  request.app.traceId = apmAgent.currentTraceIds['trace.id'],
});
server.events.on('response, (request) => {
  console.log('traceId', request.app.traceId);
});

I wanted to test it with your snippet https://gist.github.com/trentm/924b0ec426b6c81a3e76fab04f846655 but when I run it with "elastic-apm-node": "git://github.com/elastic/apm-agent-nodejs.git#5b833e69233b171317728bab1dcaf33e9be81e0d" I have

Server running on http://localhost:3000
log event: traceIds={"trace.id":"a411ffd4e0c5fb9d36a5a3d1f1844e82","transaction.id":"f6a76aa67f99dff6"} tags={"info":true} "hi to server.log from route /"
request event: traceIds={"trace.id":"a411ffd4e0c5fb9d36a5a3d1f1844e82","transaction.id":"f6a76aa67f99dff6"} tags={"info":true} "handling route /"
onPreResponse: traceIds={"trace.id":"a411ffd4e0c5fb9d36a5a3d1f1844e82","transaction.id":"f6a76aa67f99dff6"} requestId=1636468065742:Mikhails-MBP:59099:kvs6tyx0:10000 active=true
response event: traceIds={"trace.id":"a411ffd4e0c5fb9d36a5a3d1f1844e82","transaction.id":"f6a76aa67f99dff6"} requestId=1636468065742:Mikhails-MBP:59099:kvs6tyx0:10000 active=false
trentm commented 3 years ago

I'm re-opening for discussion on improving our Hapi instrumentation and to get on planning radar.

trentm commented 3 years ago

We can work the problem around on the Kibana level. What if I store trace.id during onPreResponse and use it later?

Yes, that sounds reasonable. It is nice that Hapi provides the explicit request.app for storing state like this!

trentm commented 3 years ago

However, using the onPreResponse lifecycle for response logging can lead to hard-to-debug problems. ...

That block is a very good argument for binding the APM transaction to the 'response' event handler.