Open johanznsdn opened 1 month ago
@johanznsdn - Thanks for raising! I'll put some thoughts here for discussion...
I think we're going to want a few different logging "levers":
I'm not set on these specifically, but it does seem like we'd want some granularity, such as could be provided by the below.
airbyte
- A top-level logger for PyAirbyte. Setting defaults here applies them also to the below.airbyte.source
- Passthrough of STDERR
log messages from the source.airbyte.read
- Messages about the status of a read
operation.airbyte.cloud
- Messages about cloud
functionality or API interop.airbyte.core
- Anything else not covered in the above.The standard logging levels are:
ERROR
WARNING
INFO
DEBUG
The two classic options are:
STDERR
writer (terminal)STDERR
)By default, I think we can turn everything on for the STDERR
handler at the WARNING
and above level. This is more-or-less how we have PyAirbyte configured today.
airbyte.source
(STDERR
passthrough)We don't capture or print the Source's log output as of now. We could add a file-based logger today without much downside. Basically, every time we invoke a source connector, we'd create a new file to store its log output. If anything fails, we can print the path to this file for the user's convenience.
It might be worthwhile to consider a global file logger - for everything except the source log passthrough described above. This could be configured at the INFO
level by default.
read()
processingEverything up until now is relatively boring/obvious in terms of providing needed visibility and basic user control for logging operations. But now, a hard question is: "What should we log during read()
operations?"
I'll start with events that are a more obvious "yes", followed by those that ware "maybes":
DEBUG
level and maybe not needed as of yet, it avoids having to calibrate noise-vs-signal ratio.)Of the above, only 3
seems controversial or difficult to implement. Unfortunately, 1.1
is not currently available, but all others in bullets 1
and 2
would be pretty easy additions.
@johanznsdn - Since you logged this issue, I would greatly appreciate your thoughts on what is most beneficial to you.
@aaronsteers Thank you so much! Bullets 1 and 2 seem great to me!
Great, thanks for confirming @johanznsdn. In that case, we'll focus on just those first two logging events, and (nicely) we don't need to worry (yet!) about 10s of thousands of log lines being generated. 😄
I can't commit to a timeline as of now, but I'm adding labels Accepting Pull Requests
and Good First Issue
in case anyone wants to pick this up. It's a frequent request so we'll try to prioritize it sooner than later...
cc @bindipankhudi for visibility
@vspanxcode - You mentioned you'd like to pick this up for Hackathon. Drop a comment here and I'll go ahead and assign it to you.
@aaronsteers please assign me this one
✅ Done - thanks!
Lmk if you run into any issues or if you have any questions.
As discussed here: https://airbytehq.slack.com/archives/C06FZ238P8W/p1715109561242409
It would be very useful to have live logs for easy debugging. Deciding to enable them or not via a parameter, for example (source.read(cache=cache, logs=True)).