avioconsulting / mule-opentelemetry-module

Mule Extension to generate OpenTelemetry traces and metrics
https://avioconsulting.github.io/mule-opentelemetry-module/
BSD 2-Clause "Simplified" License
24 stars 8 forks source link

Propagation error #198

Open KonnoroYatsu opened 3 weeks ago

KonnoroYatsu commented 3 weeks ago

Hello, in my project I have an initial Kafka Listener configured in a flow called Flow-A, this flow-A has a flow-ref to another flow called Flow-B that made several calls to other sub-flows.

In a few processes at the end of the processing I receive the following error:

com.avioconsulting.mule.opentelemetry.internal.processor.MuleNotificationProcessor: Error in handling flow-B flow end event

java.lang.NullPointerException: Transaction for 184ec970-5f00-11ef-8b40-e454e805b830_2045283329_490120004/flow-B cannot be null

I noticed that in the "Transaction" value, the last two values ​​separated by "_" are references to each flow, "2045283329" flow-A and "490120004" flow-B. When flow-A calls flow-B, these values ​​are concatenated in the variable "_OTEL_FLOW_CONTEXT_ID", but this does not always happen.

All the errors occurred only when the concatenation was performed, but most of the time, even when the concatenation occurred, the error was not displayed.

In this same project, I tested modifying flow-B to make it a sub-flow and noticed that the concatenation of values ​​for the variable "_OTEL_FLOW_CONTEXT_ID" stopped occurring and consequently no more errors occurred.

I would like to know if when using the module we should avoid using many flows calling other flows to avoid this problem and better understand the reason for this error.

manikmagar commented 3 weeks ago

Hi @KonnoroYatsu. The context propagation and lifecycle of flows and sub-flows are different. At runtime sub-flows just get merged into the calling flow. Whereas, the flows have separate lifecycle and sub-context. To track the context, the concatenation of flow-contexts is done. This error usually indicate that somehow the unique context id concatenation had a conflict and the same context path flow was probably ended from other invocation? Do you have multiple flow-refs in flow-A to flow-B? I am just thinking of the possibilities and some rare conditions.

There is nothing like you should avoid using flows when using this module.

KonnoroYatsu commented 3 weeks ago

I have only one flow-ref of flow-A to flow-B but the process starts with a Kafka Listener, this Listener receives many messages, approximately 5 thousand per hour. This Kafka listener don't have any asynchronous limitation, só the process are running asynchronously.

I also did some tests using module version 2.1.0 and the error occurred as well, but using version 2.0.0 the error did not occur.