estuary / animated-carnival

Other
3 stars 0 forks source link

refactor: Add the ability to hide log output unless there was an error #69

Closed jshearer closed 2 years ago

jshearer commented 2 years ago

This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere


This change is Reviewable

mdibaiee commented 2 years ago

@jshearer would it be possible to set a log_level and so only error logs come out using that? I think most flow components support --log.level 🤔

jshearer commented 2 years ago

So I think this may have been a miscommunication on my part. The purpose of this change is to prevent a bunch of noise from getting logged and presented to the user if there was no error. Ideally we'd present a special UI component showing that various steps are pending/successful/failed, but prior to that, the idea here is to pluck some of the low hanging fruit in terms of log noise.

So with that being said, I'm not sure how --log.level would help? The idea is to filter these log messages (unless there's an error) in the default mode of operation of this service (i.e running these actions via the UI, where specifying things like --log.level isn't possible).

@jgraettinger WRT that link you sent, I think @psFried showed me that. My understanding is that the flow (heh) is thus:

  1. User clicks "publish" in the web UI, which... I think creates an async task record, which gets picked up by the control-plane agent
  2. Agent then shells out to Go, capturing stdout/stderr,
  3. Go does a bunch of stuff, including shelling out to Docker
  4. Agent writes stdout/stderr to supabase logs using log token etc
  5. Agent writes task status, which the UI picks up and presents accordingly

Ultimately we'd have structured logging and always log the output like we do now, just flagged as either "debug" if the command is successful, or "error" if the command failed. Then clientside (in the UI), we could just filter for logs at "info" or above, or even better we could filter for logs by specific component names, which would be used to present error logs next to the specific subtask that failed.

That being said, my intent with this PR was to change the behavior of step 4 above as a short-term mitigation to log noise. Instead of always writing Go's stdout/stderr to the task logs, the idea is to wait until the invocation is done, and if it exits successfully, then just log a generic Task 'foo' was successful. The reason I parameterized this, as opposed to always changing this logic, is that there are certain logs that we probably always want to show. Ideally we'd do this filtering in the web UI, but without structured logging, I put it here instead.

jgraettinger commented 2 years ago

Aside from npm output and docker pull, which is truly not helpful if they are successful... the rest of these logs are actually useful, informational, and/or important for debugging. We do want them logged and captured in the DB, even if only so that we can reference them for debugging.

As a practical matter (and having fixed npm and docker pull) the remaining logs aren't too verbose, though we should keep an eye on it over time.

We should seek to improve UX by hiding them in the UI by default, to be surfaced if the user chooses to drill down into them, but we shouldn't not collect them altogether.

psFried commented 2 years ago

@jgraettinger @mdibaiee I'm not sure I understand the suggestion to use --log.level or --quiet. My understanding was that we want to suppress these logs conditionally, based on whether there was an error. But those flags would suppress the logs regardless of whether an error occurred.

We should seek to improve UX by hiding them in the UI by default, to be surfaced if the user chooses to drill down into them, but we shouldn't not collect them altogether.

I couldn't agree more, though I don't think the intent was for this to be done in this PR. The intent of this PR was to do something minimal to squelch the most egregious noise in the (hopefully) common case where the operation was successful. The way I understand it, to collect the logs unconditionally and display them conditionally will require capturing structured logs and filtering them in a query. I believe the intent was to implement that later, having addressed the biggest pain point with this workaround.

jshearer commented 2 years ago

Closing per our conversation in favor of --quiet for the time being