amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
6.8k stars 327 forks source link

v4 is released with JSON support! 👨🏼‍💻 #1852

Closed amir20 closed 2 years ago

amir20 commented 2 years ago

I have officially released the next major version of Dozzle. 🎉 Dozzle v4 has full JSON support for logs. JSON keys can be hidden based on container for easier debugging.

This version had some major rewrites so I am sure there a bugs. Please use this issue to track any bugs.

flarco commented 2 years ago

@amir20 Just wanted to say thanks for this tool, was pleasantly surprised when I refreshed my docker stack and seeing JSON support.

flarco commented 2 years ago

@amir20 OK seeing elevated CPU Usage. My Chrome tab crashed twice. image

amir20 commented 2 years ago

@amir20 OK seeing elevated CPU Usage. My Chrome tab crashed twice.

image

I wasn't able to reproduce this. I am running 10 containers and my cpu was below 2%. Can you help me debug? When does the cpu spike happen? Are you running very active logs? Thanks.

flarco commented 2 years ago

Ok, I was able to reproduce a few times, simply reloading the page and letting it run in the background. Takes about 5min, and then the tab freezes and goes into high CPU load. It is a fairly voluminous log stream from my Postgres instance (~5-10 lines per second), but this didn't happen with the v3 and I didn't change anything on my side since v4. @amir20

flarco commented 2 years ago

Oh and I have about 12 containers running, but the PG one is the most active.

amir20 commented 2 years ago

I see. That's a known issue. It was actually happening on v3 too but just took a little longer. It's just exacerbated in v4 because I happen to save a little bigger objects per log.

It has been discussed a few times. I don't know what would be a good fix. The problem is that all these logs are saved in memory and eventually become too big.

I can think of a couple solutions.

What do you think? I think pausing and resuming is the easiest for the user but that mostly fits my own use case. Not sure for others. 🤔

flarco commented 2 years ago

Ah ok I see.

Stopping the log might hurt the UX a bit. For me, I like to come back to the tab, and see the latest right there. If the logs stop coming, I would end up refreshing the page to get the logs within the last 2-3 minutes. Unless, when the user comes back, the app auto fetches the gap since tab was put in background? I think that might work.

  1. Window becomes blur, app saves timestamp for container in view (selected).
  2. Window becomes focus, app uses saved timestamp and auto refreshed the view with the latest log lines.

I'm not sure how refreshing works (as in selecting a different container to view most recent log lines), would that essentially purge old logs for the selected container? Also, if the user blurred the window viewing older logs, logic should be smart enough to not clear the state, and know if a refresh is really needed (since they're viewing old logs, not needed).

flarco commented 2 years ago

Thought about it some more, and I guess long term that wouldn't work, because the logs are still held in memory. When the user comes back and the app pulls the latest, it would get back to that freezing scenario. Plus you have to deal with the cursor (where the user is looking in the log), scrolling in the past or in the future...

So the purge would be the way to go for me. Can you explain the complexity?

If I were to approach this from a clean slate, for each line that comes in the EventStream, I would save two pieces of information in a table/object (maybe with dexie?):

  1. The timestamp (nano-second level?)
  2. The container ID or name

The combination of the two could form a sort of unique primary key of the log line. Saving those somewhere would allow us to know which lines to purge (by timestamp or max number of lines per container), allowing us to target exactly those. I'm sure I'm missing the cons in this :).

Another thought would be to push this to the backend? App could pull on demand as user scrolls? I think that's how papertrail does it. Might become a little slow though...

amir20 commented 2 years ago

@flarco That's exactly right. The problem is actually more complicated than you think. Just heads up this might be a long comment. :) But sounds like you are the only person who is understanding the complexity so far.

When the user comes back and the app pulls the latest, it would get back to that freezing scenario.

Yep, this is a problem. I don't think I would resume from where it was suspended. Instead, just resume. Also, I wouldn't use window deactivate but rather visibilitychange as someone can have Dozzle in a different screen.visibiltychange only triggers if it is hidden like a behind a tab.

So the purge would be the way to go for me. Can you explain the complexity?

Let's say the browser is holding 1000 logs. Like an array [0...999] and the window size is 100. So when the browser is all the way scrolled to the bottom it is showing [900..end]. When the user scrolls to the top, they are looking at [0...100]. In your example, if I was to remove the oldest logs (because I would only want to retain 1000 items for example) and the user is scroll to the top then it would start deleting logs as they are seeing them. In fact, they could be looking at any part of the [0...999] and I have no way of knowing if they are scroll up. Unless, I do some kind of js magic.

I hope that makes sense.

I have been looking at dexie, sql.js (WASM + SQLite and a few other options. I think I just wanted an excuse to use WASM.) But after a while, I think an embedded database is an overkill. For this use case, I can just remove the 100 items using something like array.split() in js.

Another thought would be to push this to the backend? App could pull on demand as user scrolls? I think that's how papertrail does it. Might become a little slow though...

I do actually load more logs as you scroll up. This is why it makes it even more complicated. If there are already 1000 items in the buffer, then scrolling up would be even more breaking because they would be oldest.

Here is what I THINK should happen. There are essentially four states.

  1. User scrolled to bottom and has active window
  2. User scrolled somewhere above and active
  3. User is at the bottom while window is in background
  4. User is somewhere in middle while window is in background

For each of these they need to be handled differently.

  1. When user is at the bottom, then I think it's easy. Just remove the oldest logs as you suggested.
  2. When user is in the middle and active, then it is tricky. I suggest to do nothing right now. I think its an edge case. It would be confusing if something in the middle disappears and even more frustrating if the bottom is not correctly fetching more logs.
  3. When user has placed Dozzle in background and scrolled to bottom then it's just like state 1. Throw away the oldest logs.
  4. Finally, the most tricky one, if user is not scrolled to the bottom and the tab is not visible, then I think it should suspend. This is the only solution that makes sense because when the user comes back, they would expect to see the same log. Imagine you are debugging something, see an error, switch tabs and then come back to see it missing.

Again, more complicated than I had expected, but I really don't know how else to make it simple and not confusing to people using Dozzle. It should be seamless and no one should notice Dozzle is throwing logs away.

What do you think?

flarco commented 2 years ago

Nice, thanks for the context.

The states you listed reflects my understanding as well. But I think things could be improved for 2 and 4.

The issue with suspending is that when you come back and scroll down, there would be a gap. And also I don't think state2 is such an edge case, I cross it all the time. Because I could scroll backwards, even just slightly, so that the cursor isn't at the "live" edge point, as I am reading lines. Then I could switch away to another tab/window for some reason, entering it in state 4. Then come back and forth, scrolling just slightly, reading logs. So at this point it's basically many switches between 2 and 4. And thinking of it, as you said, it's quite important for debugging, so it's important that both 2 and 4 work flawlessly for good UX.

So let us consider the following state variables for a container (admittedly a bit simplified :):

Combining this with your array example [0...1000], startLineID would be 0, endLineID is 1000, cursorLineID can be anything between 0 and 1000.

So here's how this plays out in my head, as log lines are streaming in:

With this, we could say:

Like this, the memory load is managed as user scrolls up and down, and UX is kept intact. Thoughts? I haven't studied the code base, so I don't know how feasible this would be.

amir20 commented 2 years ago

Great. We are making progress. I do agree that my state 2 has a gap. But after reading yours, I think I found a few gaps too that I wonder if you thought about them. Also, the technical limitations are a little complicated.

Let me reply inline.

And thinking of it, as you said, it's quite important for debugging, so it's important that both 2 and 4 work flawlessly for good UX.

Ok I see your point.

re: cursorLineID

This is actually harder than you think. Mapping a particular pixel to an item in an array isn't that easy. I currently know if the container is scrolled all the way via clientHeight. But to map each <li> in logs, I would have to use something like IntersectionObserver to observe when each element comes in to view. Actually pretty hard to do. I currently use that method to look for start and end for loading more and auto scrolling.

But let's assume that this is possible to continue on this solution.

If cursorLineID equals endLineID, user enters state1/3, app continues to receive stream, cursor "sticks" to endLineID to auto-scrolls down, always viewing the latest line. If maxLogLines is reached, purge occurs at the startLineID side.

So I think we agree that for both of these states, we suggested the same solution. In your solution you are using some threshold, but aside from that, if user is at bottom, trim top.

If cursorLineID is equal or less than endLineID - thresholdLines, stop receiving stream.

If the user is in the middle, and Dozzle stop streaming, wouldn't it be confusing after the user scroll down. Essentially there might have been a big gap in the logs?

If cursorLineID becomes smaller than startLineID + thresholdLines, purge occurs at the endLineID side and load earlier lines at startLineID side (as is already implemented). And inversely, if cursorLineID higher than endLineID - thresholdLines, purge occurs at the startLineID side and load later lines at endLineID side. All this occurs while satisfying maxLogLines.

If the user scrolls to the top, and maxLogLines is exceeded then wouldn't it be confusing after the user scrolls to the bottom and there is some missing logs at the bottom.

I think in my solution, I would never trim the most recent. Because as the user, (and I think you said this too) I would expect it the stream to be one continuous set of logs without ever having a gap.

What do you think?

I think your solution has two differences with mine:

  1. There is a threshold (which maybe can be figured out with scrollHeight too) This might be a good solution for your use case when just go above the fold but a little but don't expect to stop streaming.
  2. I never trim the bottom. That would be confusing if say I scroll to top, load more logs then come back to bottom and the original logs are gone.
amir20 commented 2 years ago

Also, perhaps we just solve for the most common use case. That would be most people autoscroll and maybe scroll back a bit like your threshold.

In that use case, we just trim at top.

If that works, then ship it and worry about the edge cases later. 💰

flarco commented 2 years ago

I don't think we are seeing differently, there should never be a gap.

When Dozzle stops streaming (when user is in middle), and say the user scrolls down again, Dozzle should pull the subsequent lines, as in, fill the gap (via REST call), and as user keeps scrolling down, keep pulling, at a certain threshold until the user reaches the "live" point (where now the stream is turned back on). Top is trimmed as needed if maxLogLines is reached. Hope that makes sense.

I didn't know about the difficulty of mapping each <li> to as particular log line, so totally understood that's a limiting factor.

But yes, lower hanging fruit / quickest value is just to trim the top. Sounds good. 👍

amir20 commented 2 years ago

Oh I see. It would resume from where it paused. I didn't get that. Let me think about it.

amir20 commented 2 years ago

@flarco I like your idea of threshold. Initially, I will implement a very simple logic. If not scrolled to top within 3 x window_height then drop the oldest logs.

I did some profiling and looks like the sweet spot is actually around 1100 items before the memory jumps above 400MB.

So maybe I'll just cut the logs around there.

amir20 commented 2 years ago

Well that was a fail. 😥 If the user is anywhere but the bottom, it will keep trimming the top until the current position becomes the top, at that point it will just exceed the size of the buffer. I don't know why I didn't think of that.

I think at the very minimum both sides need to be implemented.

When the buffer is FULL...

  1. If the user is towards the bottom, keep trimming the top.
  2. If the user is anywhere else, stop the streaming.

The hard logic is resuming. Because I can't resume until user has gone close to the bottom. At that point, it might have been 10000s of logs. I am not sure how that would perform. It would stream all logs passed the scroll and also trim the top. That might be funky.

🤔 Let me know @flarco if I am making it more complicated than it needs to be.

flarco commented 2 years ago

🤔 @amir20

I'm trying to understand why it keeps trimming the top if the user is not at the bottom. I mean, if the user is at the middle, the stream should stop. And it should only trim (either on top or bottom) if the buffer is nearing full. If the stream stops, you're not loading any new lines, so the buffer doesn't get exceeded, right? If fact, the buffer shouldn't change at all if user stays within bounds of loaded log lines in memory.

Ideally, if user is at middle and then later scrolls to top, it should load previous lines, and trim at bottom (if necessary, if the buffer is full). Same way, if user later scrolls to the bottom, it should load subsequent lines, and trim at top (if necessary, if buffer is full). Makes sense? Not sure what I'm missing. And if user keeps scrolling down, it should keep loading incrementally, until reaching the very bottom (the live point), at which point the stream should turn back on.

So I think it should perform just fine, even if it's been a while since the user scrolled, and then user scrolls down to bottom. It doesn't need to load all the new lines since, just a batch at a time, like 500 lines, and trim the top as needed. And maybe have a button (like that green one on the bottom right that is already there), if the user clicks that button, go to live mode (maybe just reset the view, trim all, and load from live stream again).

What am I missing? I may be making it too complicated, though, at least for now. Cause that would be a fully featured docker log scanning tool, very nifty. 👍

amir20 commented 2 years ago

I think the part you missed was comment https://github.com/amir20/dozzle/issues/1852#issuecomment-1221602705 which was to implement just the top trimming. But that obviously broke :(

Here is a couple things I have noticed with my prototype. Which when it is as a good place I can let you test it.

The trimming at top needs to only happen when auto scroll at the bottom is on. Basically no threshold. Why? Because if the browser isn't scrolling automatically, then top of the buffer will be removed and slowly the user will scroll up as the top is trimmed. I hope that makes sense, but it took me a while to understand what is going on...This wouldn't work as the user will jump around.

Scrolling down is also a problem. Imagine this:

  1. The buffer is full and the top is being trimmed
  2. I am at the bottom and auto scroll is happening
  3. I scroll up a little and pause the stream
  4. I wait 10 seconds which could be 100s of logs
  5. I scroll down and immediately the scroll will jump adding a bunch of new items.
  6. This feels unnatural as the user is jumping around in the scroll.

I feel like it should just resume and show a marker or something. That way the user (similar to live idea) can click if they want to load more between the gaps. 🤔

flarco commented 2 years ago

Ah Ok, yes I did miss that :).

Ok, so just trimming the top. All right, well the pausing/resuming needs to work right. Cause if you pause, auto-scoll should disable and user is looking at lines in buffer, there should not be a need to trim since it's paused. Cause yes, if it keeps trimming, and you're eating the beginning and the user will slowly scroll up until the beginning of the log.

Now later, when user scrolls down, and it shouldn't need to load all the lines at once until live. Just in small batches? I think that's where you'd need a new endpoint right? to load lines from tsStart to tsStart + 60s? Until the live point is reached, where the stream would come back.

The trimming at top needs to only happen when auto scroll at the bottom is on

Agreed

I scroll down and immediately the scroll will jump adding a bunch of new items. This feels unnatural as the user is jumping around in the scroll.

I'm not getting that part, why would it jump around? It would be as if new <li> lines are being appended to the element. Does that cause a jump effect?

amir20 commented 2 years ago

Let me see if I can connect the dots...

Coming back to the use case:

  1. Dozzle has been running for a while and the buffer is full
  2. User is at bottom edge and top is being trimmed
  3. User scrolls up a little and wait 10 or 20 seconds. (At 10 logs/sec that can be 200+ items)
  4. User scrolls a little lower and nothing happens because they are still not at the edge
  5. User scrolls all the way to the bottom and Dozzle notices it should load more data
  6. Now possibly three things can happen

Options 1

  1. Load X items and append to the buffer. Delete the X items from above
  2. The scroll jumps by X items and user sees data loaded
  3. If there are more logs to load, then step 1 can happen again until it's complete
  4. Meanwhile, the stream is still live so new logs can be queueing up
  5. User's scroll might have jumped two or three screens of logs

This solution is probably the most complicated because just imagine all the work I would have to do load the data incrementally while keeping tracks of the stream. :/ And even worst what if the user scrolls back up while this is happening.

Option 2

  1. Load all items and append it to the end. Trim the top by that much
  2. User's scroll will jump by a great deal and most likely confused as how they got there
  3. The stream can resume right after which might have additional logs

This is simpler but still complicated for the user as they just saw a bunch of logs fly through the screen. They have lost their original position.

Option 3

  1. While waiting Dozzle can know how many logs were skipped while paused. Let's say 145 logs were skipped.
  2. Dozzle resumes live from Date.now() and displays a message like "Skipped 145 logs. Click to show"
  3. As new messages come the top is trimmed again.
  4. Clicking on the button to load more will just load the extra logs while still scrolling at the edge

I think option 3 is the easiest and least confusing to the user. And note this is only true for really high volume logs.

Of course, none of this stuff might matter once I prototype anything. It might be just be confusing to do option 3. I am just trying to save myself work 🤔 by picking the least complex path

amir20 commented 2 years ago

Also, I haven't played with Paper Trail for a while. Maybe worth checking what they do.

flarco commented 2 years ago

Ah ok I see now. Thank you for explaining. I didn't know that you could not maintain the same view , or position in the huge log list while trimming and/or loading, so it would jump. Yes, less than ideal UX. Hmm, interesting.

But at the moment, when the user goes to the very top, dozzle loads the previous log lines and the view jumps a little while sort of maintaining the same position, doesn't? I did it today and it didn't stay on top after load. How do you do that now?

Overall, agreed that option #3 is that easier way to go ATM. And if it says that it skipped ### lines, you could have a button to reload the whole thing (similar to choosing a new container) so that the user can just scroll up to see those skipped lines.

amir20 commented 2 years ago

But at the moment, when the user goes to the very top, dozzle loads the previous log lines and the view jumps a little while sort of maintaining the same position, doesn't? I did it today and it didn't stay on top after load. How do you do that now?

That took a lot of math and calculation to get right. I had to save the position and restore. However, that's for scrolling up.

What we are talking about is different. Because now I have to trim the top and load data at the bottom while remembering the position. That's just complicated.

I'll keep playing with a prototype. I am only getting less than an hour on it each day.

flarco commented 2 years ago

👍 thanks man I'd help out but my Vue days are over. I'm happy in the React camp now 😄 . JSX ftw. I can help on the Go side though, let me know. Frontend with good UX can be so hard.

amir20 commented 2 years ago

Well ain't that funny. Papertrail does exactly option 3.

Screen Shot 2022-08-24 at 6 39 40 PM
amir20 commented 2 years ago

@flarco did you see the other message on #1860 ? Let's move the conversation there.

flarco commented 2 years ago

Yes, sorry I'm out of town till Monday. Will get to it then.

amir20 commented 2 years ago

Alright closing this since no one has complained about JSON. Looks like it is working for all.