ansible-collections / community.general

Ansible Community General Collection
https://galaxy.ansible.com/ui/repo/published/community/general/
GNU General Public License v3.0
822 stars 1.52k forks source link

add more information to opentelemetry logs #6526

Closed cstegm closed 1 year ago

cstegm commented 1 year ago

Summary

Logs of the opentelemetry plugin should contain more information. At the moment it is not possible to assign an output to a task so the logs are meaningless - in a playbook with 50 tasks it is really hard to figure out which task a output belongs to. To fix this, each log object should contain the playbook name, host and the task name. In addition, the span id should be added so that the details of the log can be associated with the span object.

Issue Type

Feature Idea

Component Name

plugins/callback/opentelemetry.py

Additional Information

Code of Conduct

ansibullbot commented 1 year ago

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot commented 1 year ago

cc @v1v click here for bot help

gone-for-coding commented 1 year ago

I'm also highly interested in this one!

v1v commented 1 year ago

Can you please elaborate this problem a bit more?

I don't know what vendor you use but do you see that behaviour with Jaeger?

A few answers to your proposal:

the span id should be added so that the details of the log can be associated with the span object.

As far as I remember https://github.com/ansible-collections/community.general/pull/4175 enabled this feature, so every span is linked to the logs using the add_event function provided by the span.

To fix this, each log object should contain the playbook name, host and the task name.

Given that the logs are attached to a span, and the span is attached to a transaction (playbook). Those attributes are accessible since the span contains those details ( Here and Here )

Is your issue related to how the vendor you use presents the traces?

gone-for-coding commented 1 year ago

For me it looks like this: APM->Services->Playbook->Tasks -> But I cannot get the task output here. Discover->I have separate lines for the output (task name etc. not filled) and the rest of the information (then no task output).

See the JSON files attached:

{
  "_index": ".ds-traces-apm-default-2023.05.10-000001",
  "_id": "e2v9I4gBPFz2rJw-lrZ1",
  "_version": 1,
  "_score": 0,
  "_source": {
    "parent": {
      "id": "ab978cda4c509f5f"
    },
    "agent": {
      "name": "opentelemetry/python",
      "version": "1.17.0"
    },
    "data_stream.namespace": "default",
    "destination": {
      "address": "api.chucknorris.io",
      "port": 443
    },
    "processor": {
      "name": "transaction",
      "event": "span"
    },
    "data_stream.type": "traces",
    "url": {
      "original": "https://api.chucknorris.io/jokes/random"
    },
    "labels": {
      "ansible_task_name": "[localhost] Elastictest: URI Task",
      "ansible_task_host_name": "localhost",
      "ansible_task_host_status": "ok",
      "ansible_task_module": "ansible.builtin.uri",
      "ansible_task_args_name": [
        "url"
      ],
      "ansible_task_message": "success",
      "ansible_task_args_value": [
        "https://api.chucknorris.io/jokes/random"
      ]
    },
    "observer": {
      "hostname": "93fd34e82f76",
      "type": "apm-server",
      "version": "8.7.1"
    },
    "trace": {
      "id": "d7e7be16d85162d114e1ad12e712cf78"
    },
    "@timestamp": "2023-05-16T09:56:39.800Z",
    "numeric_labels": {
      "ansible_task_result": 0
    },
    "service": {
      "name": "ansible_callback_otel",
      "target": {
        "name": "api.chucknorris.io:443",
        "type": "http"
      }
    },
    "data_stream.dataset": "apm",
    "event": {
      "agent_id_status": "missing",
      "ingested": "2023-05-16T09:56:41Z",
      "success_count": 1,
      "outcome": "success"
    },
    "timestamp": {
      "us": 1684230999800299
    },
    "span": {
      "duration": {
        "us": 687122
      },
      "representative_count": 1,
      "subtype": "http",
      "name": "URI Task",
      "destination": {
        "service": {
          "resource": "api.chucknorris.io:443",
          "name": "https://api.chucknorris.io",
          "type": "external"
        }
      },
      "id": "5130f08e099c9807",
      "type": "external"
    }
  },
  "fields": {
    "labels.ansible_task_host_status": [
      "ok"
    ],
    "span.name": [
      "URI Task"
    ],
    "span.destination.service.type": [
      "external"
    ],
    "destination.port": [
      443
    ],
    "labels.ansible_task_args_value": [
      "https://api.chucknorris.io/jokes/random"
    ],
    "labels.ansible_task_args_name": [
      "url"
    ],
    "labels.ansible_task_module": [
      "ansible.builtin.uri"
    ],
    "url.original.text": [
      "https://api.chucknorris.io/jokes/random"
    ],
    "trace.id": [
      "d7e7be16d85162d114e1ad12e712cf78"
    ],
    "span.duration.us": [
      687122
    ],
    "event.success_count": [
      1
    ],
    "processor.event": [
      "span"
    ],
    "labels.ansible_task_host_name": [
      "localhost"
    ],
    "service.target.type": [
      "http"
    ],
    "agent.name": [
      "opentelemetry/python"
    ],
    "destination.address": [
      "api.chucknorris.io"
    ],
    "event.agent_id_status": [
      "missing"
    ],
    "event.outcome": [
      "success"
    ],
    "labels.ansible_task_message": [
      "success"
    ],
    "service.name": [
      "ansible_callback_otel"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "processor.name": [
      "transaction"
    ],
    "span.id": [
      "5130f08e099c9807"
    ],
    "span.subtype": [
      "http"
    ],
    "service.target.name": [
      "api.chucknorris.io:443"
    ],
    "data_stream.type": [
      "traces"
    ],
    "observer.hostname": [
      "93fd34e82f76"
    ],
    "span.type": [
      "external"
    ],
    "url.original": [
      "https://api.chucknorris.io/jokes/random"
    ],
    "event.ingested": [
      "2023-05-16T09:56:41.000Z"
    ],
    "timestamp.us": [
      1684230999800299
    ],
    "@timestamp": [
      "2023-05-16T09:56:39.800Z"
    ],
    "observer.version": [
      "8.7.1"
    ],
    "observer.type": [
      "apm-server"
    ],
    "ecs.version": [
      "8.6.0-dev"
    ],
    "data_stream.dataset": [
      "apm"
    ],
    "parent.id": [
      "ab978cda4c509f5f"
    ],
    "agent.version": [
      "1.17.0"
    ],
    "span.destination.service.resource": [
      "api.chucknorris.io:443"
    ],
    "span.destination.service.name": [
      "https://api.chucknorris.io"
    ],
    "numeric_labels.ansible_task_result": [
      0
    ],
    "labels.ansible_task_name": [
      "[localhost] Elastictest: URI Task"
    ],
    "span.representative_count": [
      1
    ]
  }
}

and here is the output in the message field but not parsed and the rest of the information like task name and host is missing.

{
  "_index": ".ds-logs-apm.app-default-2023.05.10-000001",
  "_id": "fGv9I4gBPFz2rJw-lrZ1",
  "_version": 1,
  "_score": 0,
  "_source": {
    "parent": {
      "id": "ab978cda4c509f5f"
    },
    "agent": {
      "name": "opentelemetry/python",
      "version": "1.17.0"
    },
    "data_stream.namespace": "default",
    "message": "{\n    \"alt_svc\": \"h3=\\\":443\\\"; ma=86400, h3-29=\\\":443\\\"; ma=86400\",\n    \"ansible_facts\": {\n        \"discovered_interpreter_python\": \"/usr/local/bin/python3.11\"\n    },\n    \"cf_cache_status\": \"DYNAMIC\",\n    \"cf_ray\": \"7c82c0881b45d516-CDG\",\n    \"changed\": false,\n    \"connection\": \"close\",\n    \"content_type\": \"application/json;charset=UTF-8\",\n    \"cookies\": {},\n    \"cookies_string\": \"\",\n    \"date\": \"Tue, 16 May 2023 09:56:40 GMT\",\n    \"elapsed\": 0,\n    \"invocation\": {\n        \"module_args\": {\n            \"attributes\": null,\n            \"body\": null,\n            \"body_format\": \"raw\",\n            \"ca_path\": null,\n            \"ciphers\": null,\n            \"client_cert\": null,\n            \"client_key\": null,\n            \"creates\": null,\n            \"decompress\": true,\n            \"dest\": null,\n            \"follow_redirects\": \"safe\",\n            \"force\": false,\n            \"force_basic_auth\": false,\n            \"group\": null,\n            \"headers\": {},\n            \"http_agent\": \"ansible-httpget\",\n            \"method\": \"GET\",\n            \"mode\": null,\n            \"owner\": null,\n            \"remote_src\": false,\n            \"removes\": null,\n            \"return_content\": false,\n            \"selevel\": null,\n            \"serole\": null,\n            \"setype\": null,\n            \"seuser\": null,\n            \"src\": null,\n            \"status_code\": [\n                200\n            ],\n            \"timeout\": 30,\n            \"unix_socket\": null,\n            \"unredirected_headers\": [],\n            \"unsafe_writes\": false,\n            \"url\": \"https://api.chucknorris.io/jokes/random\",\n            \"url_password\": null,\n            \"url_username\": null,\n            \"use_gssapi\": false,\n            \"use_netrc\": true,\n            \"use_proxy\": true,\n            \"validate_certs\": true\n        }\n    },\n    \"json\": {\n        \"categories\": [],\n        \"created_at\": \"2020-01-05 13:42:26.991637\",\n        \"icon_url\": \"https://assets.chucknorris.host/img/avatar/chuck-norris.png\",\n        \"id\": \"NmnOmWG_SX69rff8tZ0-SQ\",\n        \"updated_at\": \"2020-01-05 13:42:26.991637\",\n        \"url\": \"https://api.chucknorris.io/jokes/NmnOmWG_SX69rff8tZ0-SQ\",\n        \"value\": \"When you get a thumbs up from Chuck Norris, you always win.\"\n    },\n    \"msg\": \"OK (unknown bytes)\",\n    \"nel\": \"{\\\"success_fraction\\\":0,\\\"report_to\\\":\\\"cf-nel\\\",\\\"max_age\\\":604800}\",\n    \"redirected\": false,\n    \"report_to\": \"{\\\"endpoints\\\":[{\\\"url\\\":\\\"https:\\\\/\\\\/a.nel.cloudflare.com\\\\/report\\\\/v3?s=RUR%2B%2FXmJucv8CwBvpTeDbMzhNW2kjoSST5wxjz6DQohEWKBD8wsw8Gsvh4fQ2TNjcw1ouHv8W43L4K1h2bmKQYsLEmc06HWngWGqrPWZ45gvNHodoFcWU2dcFpIwOL1OY7XdHkQ%3D\\\"}],\\\"group\\\":\\\"cf-nel\\\",\\\"max_age\\\":604800}\",\n    \"server\": \"cloudflare\",\n    \"status\": 200,\n    \"transfer_encoding\": \"chunked\",\n    \"url\": \"https://api.chucknorris.io/jokes/random\",\n    \"via\": \"1.1 vegur\"\n}",
    "processor": {
      "name": "log",
      "event": "log"
    },
    "data_stream.type": "logs",
    "url": {
      "original": "https://api.chucknorris.io/jokes/random"
    },
    "observer": {
      "hostname": "93fd34e82f76",
      "type": "apm-server",
      "version": "8.7.1"
    },
    "trace": {
      "id": "d7e7be16d85162d114e1ad12e712cf78"
    },
    "@timestamp": "2023-05-16T09:56:40.517Z",
    "service": {
      "name": "ansible_callback_otel",
      "language": {
        "name": "python"
      },
      "target": {
        "name": "api.chucknorris.io:443",
        "type": "http"
      }
    },
    "data_stream.dataset": "apm.app",
    "event": {
      "agent_id_status": "missing",
      "ingested": "2023-05-16T09:56:41Z"
    }
  },
  "fields": {
    "service.name": [
      "ansible_callback_otel"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "processor.name": [
      "log"
    ],
    "url.original.text": [
      "https://api.chucknorris.io/jokes/random"
    ],
    "service.target.name": [
      "api.chucknorris.io:443"
    ],
    "message": [
      "{\n    \"alt_svc\": \"h3=\\\":443\\\"; ma=86400, h3-29=\\\":443\\\"; ma=86400\",\n    \"ansible_facts\": {\n        \"discovered_interpreter_python\": \"/usr/local/bin/python3.11\"\n    },\n    \"cf_cache_status\": \"DYNAMIC\",\n    \"cf_ray\": \"7c82c0881b45d516-CDG\",\n    \"changed\": false,\n    \"connection\": \"close\",\n    \"content_type\": \"application/json;charset=UTF-8\",\n    \"cookies\": {},\n    \"cookies_string\": \"\",\n    \"date\": \"Tue, 16 May 2023 09:56:40 GMT\",\n    \"elapsed\": 0,\n    \"invocation\": {\n        \"module_args\": {\n            \"attributes\": null,\n            \"body\": null,\n            \"body_format\": \"raw\",\n            \"ca_path\": null,\n            \"ciphers\": null,\n            \"client_cert\": null,\n            \"client_key\": null,\n            \"creates\": null,\n            \"decompress\": true,\n            \"dest\": null,\n            \"follow_redirects\": \"safe\",\n            \"force\": false,\n            \"force_basic_auth\": false,\n            \"group\": null,\n            \"headers\": {},\n            \"http_agent\": \"ansible-httpget\",\n            \"method\": \"GET\",\n            \"mode\": null,\n            \"owner\": null,\n            \"remote_src\": false,\n            \"removes\": null,\n            \"return_content\": false,\n            \"selevel\": null,\n            \"serole\": null,\n            \"setype\": null,\n            \"seuser\": null,\n            \"src\": null,\n            \"status_code\": [\n                200\n            ],\n            \"timeout\": 30,\n            \"unix_socket\": null,\n            \"unredirected_headers\": [],\n            \"unsafe_writes\": false,\n            \"url\": \"https://api.chucknorris.io/jokes/random\",\n            \"url_password\": null,\n            \"url_username\": null,\n            \"use_gssapi\": false,\n            \"use_netrc\": true,\n            \"use_proxy\": true,\n            \"validate_certs\": true\n        }\n    },\n    \"json\": {\n        \"categories\": [],\n        \"created_at\": \"2020-01-05 13:42:26.991637\",\n        \"icon_url\": \"https://assets.chucknorris.host/img/avatar/chuck-norris.png\",\n        \"id\": \"NmnOmWG_SX69rff8tZ0-SQ\",\n        \"updated_at\": \"2020-01-05 13:42:26.991637\",\n        \"url\": \"https://api.chucknorris.io/jokes/NmnOmWG_SX69rff8tZ0-SQ\",\n        \"value\": \"When you get a thumbs up from Chuck Norris, you always win.\"\n    },\n    \"msg\": \"OK (unknown bytes)\",\n    \"nel\": \"{\\\"success_fraction\\\":0,\\\"report_to\\\":\\\"cf-nel\\\",\\\"max_age\\\":604800}\",\n    \"redirected\": false,\n    \"report_to\": \"{\\\"endpoints\\\":[{\\\"url\\\":\\\"https:\\\\/\\\\/a.nel.cloudflare.com\\\\/report\\\\/v3?s=RUR%2B%2FXmJucv8CwBvpTeDbMzhNW2kjoSST5wxjz6DQohEWKBD8wsw8Gsvh4fQ2TNjcw1ouHv8W43L4K1h2bmKQYsLEmc06HWngWGqrPWZ45gvNHodoFcWU2dcFpIwOL1OY7XdHkQ%3D\\\"}],\\\"group\\\":\\\"cf-nel\\\",\\\"max_age\\\":604800}\",\n    \"server\": \"cloudflare\",\n    \"status\": 200,\n    \"transfer_encoding\": \"chunked\",\n    \"url\": \"https://api.chucknorris.io/jokes/random\",\n    \"via\": \"1.1 vegur\"\n}"
    ],
    "service.language.name": [
      "python"
    ],
    "observer.hostname": [
      "93fd34e82f76"
    ],
    "data_stream.type": [
      "logs"
    ],
    "event.ingested": [
      "2023-05-16T09:56:41.000Z"
    ],
    "trace.id": [
      "d7e7be16d85162d114e1ad12e712cf78"
    ],
    "url.original": [
      "https://api.chucknorris.io/jokes/random"
    ],
    "@timestamp": [
      "2023-05-16T09:56:40.517Z"
    ],
    "ecs.version": [
      "8.6.0-dev"
    ],
    "observer.type": [
      "apm-server"
    ],
    "observer.version": [
      "8.7.1"
    ],
    "data_stream.dataset": [
      "apm.app"
    ],
    "processor.event": [
      "log"
    ],
    "service.target.type": [
      "http"
    ],
    "agent.version": [
      "1.17.0"
    ],
    "agent.name": [
      "opentelemetry/python"
    ],
    "parent.id": [
      "ab978cda4c509f5f"
    ],
    "event.agent_id_status": [
      "missing"
    ]
  }
}
v1v commented 1 year ago

As far as I see in Jaeger the logs are linked to the span which contains all the details for the span, task, host and playbook

image

I've been told that the log correlation exists in APM:

span events are always associated with spans, so what we do is pull the span's trace ID and span ID and add those to the log document then in the UI we can fetch all logs with the matching trace ID

IIUC, in your case the span and trace id for the log document are:

    "trace.id": [
      "d7e7be16d85162d114e1ad12e712cf78"
    ],
    ...
    "parent.id": [
      "ab978cda4c509f5f"
    ],

Hence there should be some log correlation. Can you double check in the APM UI whether you can find all the logs for a given trace-id? This should help with knowing what's the ansible playbook.

Regarding knowing the task within the logs, I guess adding some attributes could help for this particular case, though it will increase the disk space for those documents and some other vendors provide that visualisation in a different manner.

v1v commented 1 year ago

I just made a quick thing to support the same attributes in the spans but also in the span events:

It contains a demo I recorded to explain how those changes can be seen in Jaeger and Elastic.

let me know what you think.

gone-for-coding commented 1 year ago

Hi @v1v Thanks for the fast feedback. That is actually what I was thinking of. I guess it would not hurt to just include all the attributes from above to have the full set available as I find the args, message, rc and so on quite useful.

v1v commented 1 year ago

include all the attributes

Included, though it's now based on a feature flag, so people can opt out if their OTEl backend already provides those details.

gone-for-coding commented 1 year ago

@v1v That sounds great. Do you have a quick link to the updated version for me so I can test it?

v1v commented 1 year ago

Same PR -> #6531