Closed bvader closed 1 month ago
Running the Service Map query on the Dev tools returns this error
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"""if (parent['span.destination.service.resource'] != null
&& !parent['span.destination.service.resource'].equals("")
&& (!parent['service.name'].equals(event['service.name'])
|| !parent['service.environment'].equals(event['service.environment'])
)
) {
def """,
" ^---- HERE"
],
"script": " ...",
"lang": "painless",
"position": {
"offset": 2936,
"start": 2655,
"end": 3029
},
"caused_by": {
"type": "null_pointer_exception",
"reason": "cannot access method/field [equals] from a null def reference"
}
}
The problem happens because the service.environment
is null, breaking the scripted_metrics
agg.
Confirmed by verifying how many unique service.environment
exists for the traces used in the query
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 9,
"failed": 0
},
"hits": {
"total": {
"value": 545,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"spanDestinationServiceResources": {
"value": 16
},
"serviceEnvironments": {
"value": 0
},
"serviceNames": {
"value": 16
}
}
}
What we need to do is to check if parent["service.environment"] != null
here
[!NOTE] We must replicate the change in the serverless scripted metrics allow list config
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)
cc @roshan-elastic @smith
Workaround:
You can these 2 pipelines which will then add service.environment: unknown
to all documents that are missing the field, of course you can set to any value that you prefer. This should fix the service map issue untill a fix is released.
PUT _ingest/pipeline/traces-apm@custom
{
"processors": [
{
"set": {
"if": "ctx.service?.environment == null",
"field": "service.environment",
"value": "unknown"
}
}
]
}
PUT _ingest/pipeline/logs-apm.app@custom
{
"processors": [
{
"set": {
"if": "ctx.service?.environment == null",
"field": "service.environment",
"value": "unknown"
}
}
]
}
@crespocarlos is this a mapping issue? I have never seen this break this way, and we do not require service.environment to be defined in the document (in fact this is why we have the ENVIRONMENT_NOT_DEFINED constant). Is service.environment missing from the mappings? If so this is likely a bug in APM Server.
From what I could see, it was present in the mapping:
{
".ds-metrics-apm.app.adservice-default-2024.05.29-000030": {
"mappings": {
"service.environment": {
"full_name": "service.environment",
"mapping": {
"environment": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
"partial-.ds-traces-apm-default-2024.08.08-000161": {
"mappings": {
"service.environment": {
"full_name": "service.environment",
"mapping": {
"environment": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
}
APM in general works fine with empty service.environment
, but the service map query doesn't have any safeguards against empty service.environment
.
BTW, the error most likely started to happen after the upgrade because the comparison now uses equals
method. We could revisit that change as part of this fix.
@crespocarlos ahh I missed the fact that we changed the scripted metric agg like that. the mappings you list are from a metrics index and a frozen traces index however.
it's also present in the index that actually has data for this query so you are right:
{
".ds-traces-apm-default-2024.09.08-000202": {
"mappings": {
"service.environment": {
"full_name": "service.environment",
"mapping": {
"environment": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
}
Kibana version: Upgrade from 8.14.3 to 8.15.1
Elasticsearch version: Upgrade from 8.14.3 to 8.15.1
Server OS version: Elastic Cloud GCP
Browser version: Latest Chrome
Browser OS version: Latest Mac
Original install method (e.g. download page, yum, from source, etc.): Elastic Cloud
Temp Workaround: Here
Describe the bug: Cluster receives telemetry from the OTEL Boutique Demo. All APM Features were working fine in 8.14.3 Upgraded via Cloud Console to 8.15.1 Service Map Fails with the following Other APM Feature such a Services, Traces, Transactions, Correlated Logs, Dependencies seem to be working Cursory look in the Elasticsearch and Kibana logs show no errors
Other APM Feature such a Services, Traces, Transactions, Correlated Logs, Dependencies seem to be working
A couple the individual focused service maps work
Errors in browser console (if relevant):
Provide logs and/or server output (if relevant):
Any additional context: