cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.5k stars 3.7k forks source link

Acceptance Testing: Datadog HTTP Log Export API is missing fields/tags #106405

Open kevin-v-ngo opened 12 months ago

kevin-v-ngo commented 12 months ago

The logs collected in Datadog using the new Datadog HTTP Log Export API for Self-Hosted are missing critical fields and tags to help users identify, filter, build, and correlate dashboards.

This issue tracks

  1. Providing a default/static fields and tags for critical fields:
  1. Allow generic/custom tagging during CRDB deployment in the YAML file, example tags and values:
    tags:
    env: <prod, or production, or pmt-prod>
    service: <pmt-database, or cockroach, or app-name>
    team: <payments>
    chargeback: <ops-estore>
    apps: <store>

Custom tagging will override any default fields (SOURCE and HOST).

Agent log (with expected tags/fields)

HTTP Export API log

Related issue:

Jira issue: CRDB-29555

Epic CRDB-32142

levihernandez commented 11 months ago

Tag structure can work in the following form:

kevin-v-ngo commented 11 months ago

@levihernandez we have a few fields already in the log😀 Going to update the issue

"attributes": { "tenant_id": 1, "severity": "INFO", "line": 32, "channel": "TELEMETRY", "severity_numeric": 1, "version": "v23.1.5", "channel_numeric": 12, "tags": { "hostnossl": "", "client": "10.142.1.78:48852", "user": "root", "n": "3" } "tags": { "hostnossl": "", "client": "10.142.1.78:48852", "user": "root", "n": "3" },

"cluster_id": "11b46f5e-0379-49f8-ae47-1aace8e13563", "instance_id": 3, "file": "util/log/event_log.go", "entry_counter": 1752190,

dhartunian commented 10 months ago

@kevin-v-ngo this feature could be implemented as a static set of tags that are appended to logs on arbitrary sinks. I assume all of the required tags are static per node. I do worry a bit about how far we're going to implement features that are typically the responsibility of external observability agents. Do we know where we're going to draw the line? What if a customer wants to have different tags on different logging channels?

Also, I assume in your example you would want the following instead of a separate object per tag:

tags:
  env: <prod, or production, or pmt-prod>
  service: <pmt-database, or cockroach, or app-name>
  team: <payments>
  chargeback: <ops-estore>
  apps: <store>
kevin-v-ngo commented 10 months ago

Hi @dhartunian, this primarily allows users to filter and categorize logs in Datadog. These tags should be static per node - same across the cluster (if that's what you meant). We can guide users and scope this improvement to 'cluster-level tagging'.

There is a use case where users would want to know which node emitted the log, gateway node (and region), but that is already available in the contents of the log. Specific observability within the distributed cluster should be part of the payload/log. This is focussed on filtering and categorizing the logs themselves (e.g., enable fleet wide monitoring/log segmentation).

I do worry a bit about how far we're going to implement features that are typically the responsibility of external observability agents. Do we know where we're going to draw the line?

Agree. Given our prioritization with Datadog so far, at a minimum we should focus on their guidance on the following fields: service, ddsource, and hostname. These three fields would provide that out of the box experience (image below) so users can quickly filter/categorize. A customer of ours didn't have the proper tags to filter where metrics and log counts were not lining up. I anticipate more customers using this integration (with metrics) so would be good to ensure we follow DD's guidance as their support team engages with our joint customers.

image

What if a customer wants to have different tags on different logging channels?

At this point, we should just focus custom tags (including the 3 required fields) for the Datadog HTTP Log export (DD sink) given the agentless approach and the known gaps we have with our customer.

kevin-v-ngo commented 10 months ago

Separate note:

We could allow tags generally for HTTP sinks (http-servers). @levihernandez was actually interested in testing out this integration with other third-party tools that had their own log http export API (requiring just the api key) to see if it was going to work out of the box (e.g., Dynatrace or Honeycomb).

I wasn't sure if these integration would work out of the box (if we're checking for DD specifically, if other HTTP apis have the same fields as DD, or whether 'tags' are also consumed in those systems the same way DD expects them to.