Closed jtcohen6 closed 2 years ago
Acceptance criteria: a documented plan in our Spike's directory of what work needs to be done in the RPC server to be able to: 1) Pass a flag that enables new structured logging 2) Ensures new structured logging works as expected with the RPC server (this is the larger unknown)
@leahwicz and @jtcohen6 can you further define "enabled" / "works as expected" in this context? Specifically we currently have at least 3 different logging buckets (stdout, log file, spec poll response)-- which one(s) do we want to output structured logs?
The main goal of this is for the IDE to consume these logs so I think it would be 1) stdout 2) spec poll response. @jtcohen6 @davidharting correct me if I'm wrong here
@leahwicz That sounds correct. The cloud IDE still has access to the log files from dbt. Ultimately, we want "the same logs we get today" - which I believe is primarily stdout / stderror wrapped up into JSON and sent from the rpc server.
I'm not precisely sure what "spec poll response" is.
Quick follow-up during meeting: I tried looking into this several months ago in the leadup to v1.0. My big-picture understanding is that the RPC server batches logs
associated with each task / subprocess, and makes those logs
available when the task is polled. (I believe that's how the IDE serves logs
up to end users in the UI, but that is worth confirming.)
dbt-rpc
does this by hooking into some logbook
-specific functionality today. We mayyybe could accomplish it using our new logger's in-memory EVENT_HISTORY
instead...? I gave this a quick try back in Oct/Nov, but I wasn't able to get it working (obviously).
I can try swinging back here with more details later.
@davidharting Can you confirm that the rpc poll response is the desired location for the logs when you get a chance?
@iknox-fa that is correct. We poll, and grab the logs from the poll response to render logs to the user. Once we get a "success" response from polling, we know we have the complete set of logs and continue to display that set of logs to the user any time that they refer back to that historical invocation
@iknox-fa Looks like I misunderstand how the logs work today in RPC. Looks like each polling response may only provide new logs, rather than all logs for that invocation, and then we aggregate them in the frontend.
The overall answer remains the same. The way the ID expects to receive logs today is inside the polling response in the json key result.logs
. I am just still investigating how precisely it works.
I am also investigating what result.results
is. Maybe that is structuring some of the output we show to users and I am conflating them with logs.
dbt-rpc
hooks into a lot oflogbook
magic today:logger.py
I think there are two areas in particular that need spiking / investigation:
addr
,http_method
,request_id
,method
,response_code
,context
)logs
specific to that request, and the list oflogs
is included in the server response. (I think that filtering happens here)The desirable outcome may look like:
EVENT_HISTORY
to batch up log events byrequest_id
, and passing them into the response when ready / polledFeature parity
Once
dbt-rpc
supports new logging, we'll need to take stock of content/feature parity. I know a lot changed between legacy and new logging. There's one piece in particular that comes to mind:logbook
would enrich every row with theunique_id
of the node it's relevant to, sticking it inextra.unique_id
(source)NodeInfo
include a much richernode_info
dictionary, includingnode_info.unique_id
— but this is certainly fewer events than the ones which previously hadextra.unique_id
. Most notably, it's missingSQLQuery
, the actual SQL run for this model.This is important for supporting the dbt Cloud IDE's ability to show logs per node run (e.g. this screenshot), which expects to filter on
extra.unique_id
today