cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.92k stars 3.78k forks source link

sql,server: create an export/ingest API pair for SQL execution stats #104436

Open knz opened 1 year ago

knz commented 1 year ago

We would like to disentangle the production of SQL o11y data (when/where sql queries are executed) from its storage/consumption (where sql o11y data is queried).

Expected deliverables in v23.2:

Acceptance criteria:

Research directions:

Epic: CRDB-28526

Jira issue: CRDB-28528

abarganier commented 1 year ago

I think this is the right issue to choose as a starting point for event export. SQL execution stats are our primary use case, and I think we can use existing work done in this area to export events. I'll jot down some thoughts/pointers for getting started here, but this work will very well may involve some tweaking of the format of our exported events to fit the latest and greatest design. That's okay! Let's make this iterative. The export interface can evolve alongside the service that ingests those same events.

Infra to Export Events

Work was already done in the previous cycle to build infrastructure to export events from CRDB in OTLP format as structured logs. This is done via the EventsExporterInterface, which already has an implementation using gRPC. As much as possible, we should use this gRPC implementation during the early stages of development so we can favor velocity in ironing out the event formats and interfaces.

Somewhat of an example implementation exists for the CRDB event log, which can be found here. Use this for guidance, but don't let it box us in with respect to our implementation. We should feel empowered to make changes wherever necessary to fit the latest design.

Exported Events

Exported event data should be in protobuf format. The on-the-wire representation compared to JSON is an attractive choice to us, as not having to deserialize JSON within the o11y service will save a massive amount of CPU. The o11y service is expected to have access to the relevant protobuf files to parse events on the ingest side.

Some guiding principles should be considered when building out the SQL execution events.

  1. We want granular events to be exported by CRDB, not aggregated results. The goal is for aggregation to be moved outside of CRDB entirely. This would mean one event, per SQL query executed.
  2. Throughput and scale should not be a concern this early on. We are setting a loose goal to have the ability to export events for nodes handling 100QPS initially. Increasing throughput is left as an optimization exercise for later.
  3. Events themselves should be versioned. This should be independent of the CRDB version! This will allow downstream systems to act accordingly.
  4. Information in events should not require further communication with CRDB to make sense, once exported. For example, imagine exporting data containing a foreign key, where the data that the FK references is an important piece of information. Will the downstream system know what that foreign key relates to? Probably not. Instead, consider joining that data beforehand into the event, prior to exporting. We want the information within events to be self-contained.
  5. Exported events should have their own protobuf messages. We will likely need to provide a client library with all the exported protobuf messages. Let's make sure the exported event protos live somewhere convenient to make our lives easier when that day comes.

Questions / Unsure?

Contact someone on the o11y Infrastructure team! This is an iterative process. There will likely be some brainstorming involved, and that doesn't need to be done alone 🤝