elastic / apm-server

https://www.elastic.co/guide/en/apm/guide/current/index.html
Other
1.21k stars 518 forks source link

Duplicate events in logs-apm.error datastream when log sending is enabled on agents #13743

Open lahsivjar opened 1 month ago

lahsivjar commented 1 month ago

APM Server version (apm-server version): All versions supporting log ingestion

Description of the problem including expected versus actual behavior: The current datastream routing code uses event.Type() which is inferred based on the below logic:

func (a *APMEvent) Type() APMEventType {
    switch {
    case a.Metricset != nil:
        return MetricEventType
    case a.Error != nil:
        return ErrorEventType
    case a.Log != nil || a.Event.GetKind() == "event":
        return LogEventType
    case a.Span.GetType() != "":
        return SpanEventType
    case a.Transaction.GetType() != "":
        return TransactionEventType
    }
    return UndefinedEventType
}

(ref) A log event with stacktrace, error type, or error message, is parsed into the APMEvent#Error. Which would route them to error datastream (logs-apm.error.*). OTOH, for an error event captured by the agent with error type, error message, and stacktrace as Exception#Stacktrace will also be parsed into APMEvent#Error.

Meaning both the above events would be treated as an error type rather than log type. This leads to the issue when an agent is configured to capture logs. In this case, both the error event and the log will be sent to APM-Server and would end up in the same datastream causing duplicate error events. In addition, the grouping key for these would be difference since the grouping key for an error event would used the parsed stack trace (ref).

Steps to reproduce:

Send an error event and a log event for the same error to APM-Server and observe both being indexed in the same, logs-apm.error* datastream.

Please include a minimal but complete recreation of the problem,

Provide logs (if relevant): N/A

lahsivjar commented 1 month ago

One solution to this could be to ALWAYS route any explicit log events (with event root as log) to the log data-stream.