elastic / apm-agent-nodejs

https://www.elastic.co/guide/en/apm/agent/nodejs/current/index.html
BSD 2-Clause "Simplified" License
585 stars 225 forks source link

Transactions and Spans are getting attached to wrong parent transaction #1917

Open sunishdahiya opened 3 years ago

sunishdahiya commented 3 years ago

Describe the bug We have integrated the APM in our microservice architecture making use of distributed tracing feature. But we are running into a issue that often child transactions and spans end up on the wrong parent transaction. This happens mostly when requests come simultaneously(concurrently).

Example trace having the above mentioned issue:

export (1)

In the above trace the /getIntersectingHubs API call shouldn't be there as in api/alerting/getAlerts we never call this API.

Expected behavior There shouldn't be any extra API or database call in the distributed trace.

Given below is the correct distributed trace for api/alerting/getAlerts API.

export

Environment (please complete the following information)

How are you starting the agent? (please tick one of the boxes)

Additional context

ELASTIC_APM_ENABLED=true ELASTIC_APM_SERVER_URL="http://:8200" ELASTIC_APM_SERVICE_NAME="$SERVICE_ID-$ENV" ELASTIC_APM_ASYNC_HOOKS=false ELASTIC_APM_CAPTURE_BODY="off" ELASTIC_APM_TRANSACTION_SAMPLE_RATE=0.1 ELASTIC_APM_CAPTURE_EXCEPTIONS=true ELASTIC_APM_CAPTURE_ERROR_LOG_STACK_TRACES="messages" ELASTIC_APM_CAPTURE_SPAN_STACK_TRACES=false ELASTIC_APM_SOURCE_LINES_SPAN_APP_FRAMES=0 ELASTIC_APM_SOURCE_LINES_SPAN_LIBRARY_FRAMES=0 ELASTIC_APM_SOURCE_LINES_ERROR_APP_FRAMES=0 ELASTIC_APM_SOURCE_LINES_ERROR_LIBRARY_FRAMES=0 ELASTIC_APM_ERROR_MESSAGE_MAX_LENGTH="1kb" ELASTIC_APM_STACK_TRACE_LIMIT=0 ELASTIC_APM_TRANSACTION_MAX_SPANS=100 ELASTIC_APM_API_REQUEST_TIME="10s" ELASTIC_APM_API_REQUEST_SIZE="768kb" ELASTIC_APM_METRICS_INTERVAL="30s"

astorm commented 3 years ago

@sunishdahiya Nice to meet you, and thank you for reporting this bug.

Based on what I see so far and your description, this looks like a classic example of asynchronous transaction state conflation. Because of Node's asynchronous execution model the Agent can sometimes loose track of what the current transaction is. This can be especially true if you have

ELASTIC_APM_ASYNC_HOOKS=false

set to false and you're using a DB package or HTTP fetching package that we don't explicitly instrument.

We have a few questions that will better help us understand this.

  1. Do you know why you're using ELASTIC_APM_ASYNC_HOOKS=false? If you set this to true does the problem remain?
  2. What DB library (or libraries) are you using to make your DB calls?
  3. What HTTP library (if any) are you using to make your service requests?
  4. Are you doing any custom instrumentation with the Agent API?

With that information in hand we should be able to better diagnose this issue and advise.

sunishdahiya commented 3 years ago

Hi @astorm , Thanks for the quick response

We have set ELASTIC_APM_ASYNC_HOOKS to false because in the documentation it is mentioned that ASYNC_HOOKS will have no effect if the node version is older than 8.2.0 and all our services use node version 8.1.0.

  1. We are using"sequelize": "^3.19.3" and "mongoose": "~4.5.9".
  2. "request-promise": "^4.2.4"
  3. We are using startTransaction API only to capture kafka-events. Code for that is given below
    
    const recordTransaction = async (transactionType, transactionName, transactionHandler, ...transactionHandlerArgs) => {
    const transaction = apm.startTransaction(transactionName, transactionType)
    try{
    await transactionHandler(...transactionHandlerArgs);
    transaction.end()
    } catch(error) {
    await apm.captureError(error)
    transaction.end()
    throw error
    }
    }
sunishdahiya commented 3 years ago

@astorm Any update on this?

williamoliveira commented 3 years ago

I am using default value for asyncHooks, which documentations says its true and I am seeing wrong transaction traces

trentm commented 3 years ago

With version 3.24.0 of the agent there was a significant re-work of the async run context handling that is largely responsible for these kinds of issues.

However, often it really comes down to specifics. @williamoliveira I would encourage you to open a separate issue with as much detail as you can provide. The ideal is a repro case showing the issue.

@sunishdahiya Sorry for the delay. With the above information I'm not able to confirm whether the changes in v3.24.0 of the agent fix the issue for you. Are you able to retry with the latest agent version? Then we could go from there?

paulish commented 8 months ago

await Promise.resolve(); helps to workaround this issue. Just place it before the manual transaction/span creation.