dbt-labs / dbt-rpc

A server that can compile and run queries in the context of a dbt project. Additionally, it provides methods that can be used to list and terminate running processes.
https://docs.getdbt.com/reference/commands/rpc
Apache License 2.0
19 stars 7 forks source link

[CT-554] [Spike] Support new logging #82

Closed jtcohen6 closed 2 years ago

jtcohen6 commented 2 years ago

dbt-rpc hooks into a lot of logbook magic today: logger.py

I think there are two areas in particular that need spiking / investigation:

The desirable outcome may look like:

Feature parity

Once dbt-rpc supports new logging, we'll need to take stock of content/feature parity. I know a lot changed between legacy and new logging. There's one piece in particular that comes to mind:

This is important for supporting the dbt Cloud IDE's ability to show logs per node run (e.g. this screenshot), which expects to filter on extra.unique_id today

leahwicz commented 2 years ago

Acceptance criteria: a documented plan in our Spike's directory of what work needs to be done in the RPC server to be able to: 1) Pass a flag that enables new structured logging 2) Ensures new structured logging works as expected with the RPC server (this is the larger unknown)

iknox-fa commented 2 years ago

@leahwicz and @jtcohen6 can you further define "enabled" / "works as expected" in this context? Specifically we currently have at least 3 different logging buckets (stdout, log file, spec poll response)-- which one(s) do we want to output structured logs?

leahwicz commented 2 years ago

The main goal of this is for the IDE to consume these logs so I think it would be 1) stdout 2) spec poll response. @jtcohen6 @davidharting correct me if I'm wrong here

davidharting commented 2 years ago

@leahwicz That sounds correct. The cloud IDE still has access to the log files from dbt. Ultimately, we want "the same logs we get today" - which I believe is primarily stdout / stderror wrapped up into JSON and sent from the rpc server.

I'm not precisely sure what "spec poll response" is.

jtcohen6 commented 2 years ago

Quick follow-up during meeting: I tried looking into this several months ago in the leadup to v1.0. My big-picture understanding is that the RPC server batches logs associated with each task / subprocess, and makes those logs available when the task is polled. (I believe that's how the IDE serves logs up to end users in the UI, but that is worth confirming.)

dbt-rpc does this by hooking into some logbook-specific functionality today. We mayyybe could accomplish it using our new logger's in-memory EVENT_HISTORY instead...? I gave this a quick try back in Oct/Nov, but I wasn't able to get it working (obviously).

I can try swinging back here with more details later.

iknox-fa commented 2 years ago

@davidharting Can you confirm that the rpc poll response is the desired location for the logs when you get a chance?

davidharting commented 2 years ago

@iknox-fa that is correct. We poll, and grab the logs from the poll response to render logs to the user. Once we get a "success" response from polling, we know we have the complete set of logs and continue to display that set of logs to the user any time that they refer back to that historical invocation

davidharting commented 2 years ago

@iknox-fa Looks like I misunderstand how the logs work today in RPC. Looks like each polling response may only provide new logs, rather than all logs for that invocation, and then we aggregate them in the frontend.

The overall answer remains the same. The way the ID expects to receive logs today is inside the polling response in the json key result.logs. I am just still investigating how precisely it works.

davidharting commented 2 years ago

I am also investigating what result.results is. Maybe that is structuring some of the output we show to users and I am conflating them with logs.

iknox-fa commented 2 years ago

Findings in brief: https://www.notion.so/dbtlabs/Structured-logging-with-dbt-rpc-79a87bef88394e39a97df5b0abe98af7