amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
5.7k stars 287 forks source link

Large amount of logs blocks the UI #3225

Closed amir20 closed 2 weeks ago

amir20 commented 3 weeks ago
          hello, just want to avoid opening a new issue but this issue is pretty worse when the log burst come from a live container that you are following realtime, just to give a real case i'm using sftpgo with logs at the debug level, when an client came to the sftp server a download folders a lot of log are generated and the dozzle ui will be totally freezed... i was getting crazy as dozzle was keeping freezing "randomly" than i just realized that this was happening only on that container that have a little complex json logs

the culprit i'm pretty sure is in the parseMessage func, but idk how this can be optimized avoiding locking the main presentation thread freeze the full browser tab while parsing

Originally posted by @FrancYescO in https://github.com/amir20/dozzle/issues/3122#issuecomment-2305159542

amir20 commented 3 weeks ago

@FrancYescO I created a new issue.

I first tried to reproduce this. I was able to reproduce it with my own custom script. I produced logs every 100ms and then every 2 seconds, I produced 10000 logs. I see the UI freezing a lot.

I am not sure what can be improved. I see some people talking about moving the parsing to a web worker. I am going to see if that's even feasible. I am not of any way to see what is freezing the UI.

But being able to reproduce it is half the problem. :)

FrancYescO commented 3 weeks ago

You can still take portainer as inspiration, haven't looked exactly how they are presenting the logs, but also with lot of logs to present I never seen a freeze of the ui

amir20 commented 3 weeks ago

I'll check out portainer. But I have a feeling portainer uses a textbox which causes less dom updates.

I did try something interesting. I replaced parseMessage with just a dummy implementation that always returns the same log. So I removed all JSON parsing. In my test, I did still the browser freeze. Which means the problem isn't in the JSON parsing but actually adding those HTML elements to the DOM.

I am not sure if there is a way around it. I imagine adding 4000+ items to the HTML dom is just intensive.

Even using a virtual scroller wouldn't work because while the user is tailing, it would all need to be refreshed. I think it doesn't matter how much I optimize, the load is just so high that appending these many logs to the DOM is CPU intensive.

amir20 commented 3 weeks ago

I checked out Portainer. It doesn't use a textbox. But it only shows the last 100 lines, which isn't very helpful.

Maybe having a virtualscroll would help. I have tried adding virtual scroll before but it didn't go really well.

amir20 commented 3 weeks ago

I quickly prototyped a virtual scroller. It's definitely better. But there are so many other bugs with spacing and height that I think it's beyond the time I can spend on it. I'll keep this open.

To summarize, the issue isn't with parsing of JSON, but rather the DOM is getting so large in bursts that is causing a lot of blocking.

FrancYescO commented 3 weeks ago

You can change the row to load in portainer on top, is slow to load but it does not freeze the ui also with 100.000

Are you adding to the DOM line per line? Maybe some sort batch process than add can help (someting like delay the add to dom for 100ms waiting for another message to came)

amir20 commented 3 weeks ago

You can change the row to load in portainer on top, is slow to load but it does not freeze the ui also with 100.000

That does freeze the UI for me. Scrolling some how works but all the buttons are not interactable for me. Clicking the timestamp also freezes for a few seconds.

Are you adding to the DOM line per line? Maybe some sort batch process than add can help (someting like delay the add to dom for 100ms waiting for another message to came)

That's what it does. It buffers up to 1 second and then adds to the dom as batch. The adding part is done via Vuejs though. I believe it does it all in one go and it is fairly performant. The reason why virtual scrolling probably helps is because if the user is only looking at the last 20 rows then no reason to append the other (x - 20) rows to the dom.

I just don't think making the scroll virtual is worth it as so many other things broke when I do it. And it's not like it's a list of all equally sized components.

I wish someone was an expert in DOM or Vue and could help. 🤷🏼‍♂️

amir20 commented 3 weeks ago

@FrancYescO When you get a chance can you try https://github.com/amir20/dozzle/pull/3227? I just did random optimizations. Not sure if it would make a difference but I tried trimming the buffer sooner. You can try it with amir20/dozzle:pr-3227. Based on my tests I saw browser freezing less.

I think there might be a sweet spot where it's good enough.

FrancYescO commented 3 weeks ago

Surely is looking better, to be clear i'm still having freezes like 15s when loading previous messages and 5s when in live mode but this surely depends on the volume of messages that are coming (or maybe, on the amount that are visualized/loaded in the ui?), at least seems i never falled in a totally unresponsive tab that should be closed like before...

ps. a little indication on the ui that we are in "live mode" can be useful: pretty often i scroll a little up and after a bit that i see no message coming i go down and i get the "Skipped xxx entries" that will cause me to lose that logs as the only way to retrive is a refresh (and make me realize that i was ignoring new messages)

ps2. did you have tried to remove the batch processing? maybe lot of smaller update is better than a single one with lot of content

amir20 commented 3 weeks ago

The problem is that I can't really reproduce the 15s and 5s pauses. I have seen a few seconds. So anything I try is just a best guess. I am using Chrome on Macbook Air with M2.

For now, let's focus on the live mode since I think that's the most common use case for folks.

I have a test similar to your logs. I setup a test.log.

❯ cat test.log | cut -d. -f 1 | uniq -c                                                                                                                                     15:09:45
4620 2024-08-25T00:06:35
  10 2024-08-25T00:06:36
5010 2024-08-25T00:06:37
  10 2024-08-25T00:06:38
5010 2024-08-25T00:06:39
   9 2024-08-25T00:06:40
5010 2024-08-25T00:06:41
  10 2024-08-25T00:06:42
5010 2024-08-25T00:06:43
  10 2024-08-25T00:06:44
5010 2024-08-25T00:06:45
   9 2024-08-25T00:06:46
5010 2024-08-25T00:06:47
  10 2024-08-25T00:06:48
3505 2024-08-25T00:06:49
1515 2024-08-25T00:06:50
 393 2024-08-25T00:06:51
4626 2024-08-25T00:06:52
  10 2024-08-25T00:06:53
5010 2024-08-25T00:06:54
  10 2024-08-25T00:06:55
5010 2024-08-25T00:06:56
   9 2024-08-25T00:06:57
5010 2024-08-25T00:06:58
  10 2024-08-25T00:06:59
5010 2024-08-25T00:07:00
  10 2024-08-25T00:07:01
5009 2024-08-25T00:07:02
  10 2024-08-25T00:07:03
5010 2024-08-25T00:07:04
  10 2024-08-25T00:07:05
5010 2024-08-25T00:07:06
  10 2024-08-25T00:07:07
5010 2024-08-25T00:07:08
   9 2024-08-25T00:07:09
5010 2024-08-25T00:07:10
  10 2024-08-25T00:07:11
5010 2024-08-25T00:07:12
  10 2024-08-25T00:07:13
5006 2024-08-25T00:07:14

I do get some unresponsiveness.

a little indication on the ui that we are in "live mode" can be useful

Agreed but let's come back to this later.

ps2. did you have tried to remove the batch processing? maybe lot of smaller update is better than a single one with lot of content

I think you are right. There maybe some improvement. The batching right now is done with a timer with up to 1 second. So theoretically it can have 10K items which isn't great. I should probably flush the batch with total number of logs too.

Here is what I am going to try:

  1. Make the batch 500ms instead of 1 second
  2. Batch size should have a max size of 200
  3. I won't try virtual scroll anymore because i don't think it will matter if I control the size myself.
  4. And finally, I am going to try to do micro optimization by removing Vue where possible.

I'll let you know when I have something to test.

Let me know if there is a better test for me to try to reproduce your scenario.

amir20 commented 3 weeks ago

So it turns out moving batch size to a smaller number actually makes performance worst. Because then it needs to flush 5x per second which isn't great.

I made more improvements but most notably, I just replace the buffer with the latest messages now which seems to improve a lot.

Try the latest. Also look at cat test.log | cut -d. -f 1 | uniq -c above. I think that's pretty close to your set up. I guess the only difference is that I don't have JSON just simple logs.

amir20 commented 3 weeks ago

Based on my testing, this looks really good so far. No freezes at all for me.

FrancYescO commented 3 weeks ago

Nice, going to do some tests in ~14h

FrancYescO commented 2 weeks ago

i can confirm that now in live mode is a lot better, when the burst arrived i get a <1s freeze, but is totally acceptable