Open FrancYescO opened 1 week ago
I reviewed the differences between versions 7.0.4 and 7.0.5 at https://github.com/amir20/dozzle/compare/v7.0.4...v7.0.5. One thing stood out: in version 7.0.5, I follow containers and only stream logs since the container started. This change prevents old logs from appearing when containers are restarted. I doubt this is the source of the problem.
I need a way to reproduce this issue on my end.
/stream
return? I can't think of anything else.
1) is not a vanilla nginx is the one embedded in the frontend of this stack: https://github.com/penpot/penpot/blob/develop/docker/images/docker-compose.yaml runned via portainer and with restart: unless-stopped instead of always
2) No. i have ~30services on this host only this container have the issue
3) Sure.
4) After ~6s just pings :ping
I am stumped on this one. Seems like your frontend container is based off nginxinc/nginx-unprivileged
. docker run nginxinc/nginx-unprivileged
seems to work with Dozzle.
Not sure what can be done to debug. Maybe completely destroy the container and recreate it.
i've recreated the container and the bug (but also older logs :D) disappeared, so actually i have no way to replicate we don't know the culprit but maybe we can just close this isse....
i reuse this issue but i'm not sure this is the same bug ... today with 8.8.0 i was trying to see log of a specific container but figured out that it was searching for logs of wrong-destroyed container id:
i got my container id e9fbd40f2b29 running than deployed a new image via portainer, so the container e9fbd40f2b29 got stopped and restarted with a new id 20c05b0abb96, dozzle did not switched to the new container and also after refreshing the page/clearing cache looks like was still seeing the old id e9fbd40f2b29 as running (but with no logs)
each click on the container in the sidebar result in dozzle logs says that the containerid was not found but in the webpage the "Container not found" message did not show (was just loading a few sec than shows no log, not the same as if you put a non-existent id in the URL)
recreating the container did not help this time, instead restarting dozzle container fixed the issue.... is there some sort of cache somewhere?
Hey @FrancYescO,
I haven't seen this issue personally. Someone else created it a while back, but I can't find it right now. I'll need your help to debug. If I can't reproduce the issue, it will be difficult to fix.
Yes, Dozzle has an internal "store" that caches and stores all statistics for containers. It usually works well. I listen for specific Docker events to update the internal store.
It seems like there might be an issue with Portainer, as the other case I remember was similar.
Can you try to reproduce this state again? If you can consistently get into this state, it would be helpful to enable DEBUG
logs for Dozzle and also keep docker system events
open while this is happening.
Most likely, some unusual event is occurring, and Dozzle isn't refreshing its own store.
Thanks!
i'll keep an eye on it to find a way to easily reproduce... surely i can say that seems is the first time i've seen the issue and my deployment was ever done in the same way
listen for specific Docker events
at what events are you listening for destroy and recreate? ... maybe there is some edge-case event the is getting ignored
Dozzle listens to start, destroy, die, rename and few other ones.
You can find it at https://github.com/amir20/dozzle/blob/7ccc3121904e11b3230e1cd2fae9d2b08f863d52/internal/docker/container_store.go#L180-L234
Btw just happened again...
I'll keep trying to isolate the issue but pretty sure is something with portainer webhook redeploy (or recreate?) methods
i'm not 100% sure but this should be an extract during that bug happend .. maybe we miss kill? or the rename after kill-stop-die-destroy do some magic ?
...
2024-11-20T10:41:57.305370996+01:00 image pull registry.org/image:latest
2024-11-20T10:41:57.418400610+01:00 container create 6e5d2f7db6f5a8b2d1b0ae6a42aed70dbb7f039fb51a79599e274e2d4b5f98b4
2024-11-20T10:41:57.436166302+01:00 container kill 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.120718193+01:00 container stop 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.125265685+01:00 container die 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.136485321+01:00 container destroy 21a85032fee12773a36b060f769dc7b5c7ad72af0342333abb7642d2d9eef6a9
2024-11-20T10:41:58.145454536+01:00 container rename 6e5d2f7db6f5a8b2d1b0ae6a42aed70dbb7f039fb51a79599e274e2d4b5f98b4
2024-11-20T10:41:58.804224198+01:00 container start 6e5d2f7db6f5a8b2d1b0ae6a42aed70dbb7f039fb51a79599e274e2d4b5f98b4
...
Hmmm....
I don't see anything wrong here. I see two containers. The one starting 21
and the other with 6e
.
6e
gets created. This event is ignored by Dozzle. start
ed which should be handled correclty. And then other container:
21
get killed. Which is ignored. stop
ed which is ignored. die
should be picked up. destroy
should remove it. One thing you can do is run with --level trace
for Dozzle. You should see something like "event=.... received container event" which output all the events Dozzle sees.
Perhaps compare those with system events
. There might be something else happening. How are you getting to this state? I tried setting a Portainer a while ago but it just seemed more complicated and I couldn't figure out how to use it properly.
It happened 2times during a stack update via portainer webhook (you should register at least a free account for the business edition)
But this happens various time during the day and all keeps working correctly so I'm still not able to find an easy replicable path...
But this happens various time during the day and all keeps working correctly so I'm still not able to find an easy replicable path.
I wonder if this is a race issue, as it occurs sometimes but not always.
1) CI push an image > call portainer stack webhook > portainer pull and redeploy the stack (the updated pushed image) 2) container is shown on the sidebar but not loading logs as it points to a wrong container ID 3) ID (the one showed also in browser URL)
I see. So wrong id, correct name. I didn't realize this is part of a stack. I can try it out.
@FrancYescO, is dozzle part of your stack, or is it always running separately, and the stack is updated independently?
@FrancYescO, is dozzle part of your stack, or is it always running separately, and the stack is updated independently?
I have various stack, and dozzle is part of another stack
@FrancYescO, no luck. I set up Portainer Community Edition and created a new stack like this:
services:
echo:
image: amir20/echo:latest
command: -r
deploy:
replicas: 3 # Specifies three replicas
update_config:
parallelism: 1 # Update one container at a time
delay: 10s # Wait 10s between updating each container
restart_policy:
condition: on-failure
max_attempts: 3
Everything was working well in Dozzle. I redeployed the stack by changing amir20/echo:latest
to amir20/echo:v2
. I checked the logs, and everything looked fine. In Dozzle, all the IDs matched those in docker ps
, and everything functioned properly.
I wonder if the issue is related to the webhooks. Unfortunately, that's beyond the time I have available. I can't seem to reproduce it easily. If you find a reliable way to reproduce it without setting up the entire pipeline, that would be helpful.
in my case the image, have ever the same "latest" tag.... but i don't think that matter... this happened 2 time more today (while the stack was probably redeployed with no dozzle issue 20+ more), but i was still not tracing the events properly i'll update here if i find something more
Ah what surely I should say is that during the portainer redeploy you will NEVER be redirected to the new container (but this can be a different issue related to https://github.com/amir20/dozzle/issues/3036 )
Here we are again, and now i'm sure, here a result of the dozzle filtered log grep -E "e4c57f33f6d8|c8cabeb803ed|Dozzle"
e4c57f33f6d8 > container that remain orphaned after the stack update c8cabeb803ed > the container that replaced e4c57f33f6d8 Dozzle > just to show when i've restarted dozzle
"Error response from daemon: No such container: e4c57f33f6d8" is me clicking on the cointainer in the sidebar and getting no logs...
looks dozzle never received events from c8cabeb803ed, the only thing i see before the Dozzle restart is "starting to stream stats"...
2024-11-22T09:01:50.058746494Z {"level":"info","time":"2024-11-22T09:01:50Z","message":"Dozzle version v8.8.1"}
2024-11-22T11:04:52.017508387Z {"level":"info","time":"2024-11-22T11:04:52Z","message":"Dozzle version v8.8.1"}
2024-11-22T14:03:57.154746555Z {"level":"trace","event":"create","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"received container event"}
2024-11-22T14:03:57.899557141Z {"level":"trace","event":"rename","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"received container event"}
2024-11-22T14:03:57.899577988Z {"level":"debug","action":"rename","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"container event"}
2024-11-22T14:03:57.899582791Z {"level":"debug","action":"rename","id":"e4c57f33f6d8","time":"2024-11-22T14:03:57Z","message":"container event"}
2024-11-22T14:03:58.380355811Z {"level":"debug","container":"e4c57f33f6d8","host":"sasasa","time":"2024-11-22T14:03:58Z","message":"starting to stream stats"}
2024-11-22T14:03:58.380398024Z {"level":"trace","event":"start","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"received container event"}
2024-11-22T14:03:58.380429289Z {"level":"debug","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"Finding container"}
2024-11-22T14:03:58.402801270Z {"level":"debug","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"container started"}
2024-11-22T14:03:58.402823770Z {"level":"debug","action":"start","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"container event"}
2024-11-22T14:03:58.402845560Z {"level":"debug","action":"start","id":"e4c57f33f6d8","time":"2024-11-22T14:03:58Z","message":"container event"}
2024-11-22T14:18:38.038759596Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:18:38Z","message":"Streaming logs for container"}
2024-11-22T14:28:53.477980308Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:28:53Z","message":"Streaming logs for container"}
2024-11-22T14:34:27.811490031Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:34:27Z","message":"Streaming logs for container"}
2024-11-22T14:35:18.459489416Z {"level":"debug","container":"e4c57f33f6d8","time":"2024-11-22T14:35:18Z","message":"streaming ended"}
2024-11-22T14:35:18.459538628Z {"level":"debug","event":"container-stopped","container":"e4c57f33f6d8","time":"2024-11-22T14:35:18Z","message":"received event"}
2024-11-22T14:35:19.570267074Z {"level":"debug","container":"c8cabeb803ed","host":"sasasa","time":"2024-11-22T14:35:19Z","message":"starting to stream stats"}
2024-11-22T14:38:30.085802175Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:38:30Z","message":"Streaming logs for container"}
2024-11-22T14:38:30.086887681Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:38:30Z","message":"unknown error while streaming logs"}
2024-11-22T14:39:16.178559669Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:39:16Z","message":"Streaming logs for container"}
2024-11-22T14:39:16.180725431Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:39:16Z","message":"unknown error while streaming logs"}
2024-11-22T14:39:26.350330671Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:39:26Z","message":"Streaming logs for container"}
2024-11-22T14:39:26.351109664Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:39:26Z","message":"unknown error while streaming logs"}
2024-11-22T14:42:49.928094353Z {"level":"debug","id":"e4c57f33f6d8","since":"2024-11-22T14:03:58Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:42:49Z","message":"Streaming logs for container"}
2024-11-22T14:42:49.929305570Z {"level":"error","error":"Error response from daemon: No such container: e4c57f33f6d8","container":"e4c57f33f6d8","time":"2024-11-22T14:42:49Z","message":"unknown error while streaming logs"}
2024-11-22T14:43:40.137630195Z {"level":"info","time":"2024-11-22T14:43:40Z","message":"Dozzle version v8.8.1"}
2024-11-22T14:43:40.209060944Z {"level":"debug","id":"c8cabeb803ed","time":"2024-11-22T14:43:40Z","message":"Finding container"}
2024-11-22T14:43:43.596250130Z {"level":"debug","container":"c8cabeb803ed","host":"sasasa","time":"2024-11-22T14:43:43Z","message":"starting to stream stats"}
2024-11-22T14:43:55.880136993Z {"level":"debug","id":"c8cabeb803ed","since":"2024-11-22T14:35:19Z","stdType":"all","host":"sasasa","time":"2024-11-22T14:43:55Z","message":"Streaming logs for container"}
Can you attach the whole logs? I am looking for other errors such as "docker store unexpectedly disconnected from docker events". It doesn't make sense why there is a bunch of "received container event" events and then suddenly they stop.
OK it's not what I thought. But I have a wild guess. Give me a little time. Let me check something and provide a PR.
https://github.com/amir20/dozzle/pull/3417 I am just taking a guess here. I noticed that after e4c57f33f6d8
started, no other events were received. So I thought maybe it's blocking for some reason.
I don't know why it would be but this would be a fix. Can you try pr-3417
?
ok i'll leave this pr running, i'll update if happens again...
If it works, it will make me go crazy why. Because currently, it shouldn't block.
By the way, how many people are using Dozzle in a browser? Is it just you or is it a few?
few, but often non-concurrent
Hmm, okay. That could explain what I am seeing. I'll add more debugging logs later. If my intuition is correct, a deadlock occurs after a new container is found because multiple sessions are trying to write to a channel.
I am guessing that didn't solve your problem. I didn't realize you had originally grep
ed the logs so only logs with e4c57f33f6d8
were being displayed. Looking at the full log, it looks to be fine.
It's crazy that I don't see c8cabeb803ed
at all in the logs except for stats. I am pretty stumped right now.
Alright, I think we need to do some deeper investigation. I have enabled stack trace dump in https://github.com/amir20/dozzle/pull/3418. You can deploy with pr-3418
.
Next time this happens, go to https://<path-to-dozzle>/debug
. You should see a page with link to goroutine
s. Send me that page. I want to understand why container store is not receiving any events. This will tell me if that thread is stuck some where. For a good state, we should see something like /dozzle/internal/docker/container_store.go:176
and container_store.go:56
.
Let me know!
I am going to merge the debug API to master. I felt adding an extra 500KB is not worth it. But whatever. If it was merged before then we could have troubleshooted this issue sooner.
So you can try master
if you want.
I've deleted my comments, was nothing related to this issue...still waiting for this to rehappen.
So has this not happened yet?
Now that i'm waiting for it, no 😅
Unfortunately the stack update now are fewer per day so less possibilities to retrigger...and I was just wondering if this can happen only if there are concurrent users using dozzle on the same container
It would increase the chances since Dozzle has a complicated backend with all kinds of multi threading and semaphores. Once it happens, the stack trace will help me if any of the routines are stuck any where. The worst thing that could happen is you say it didn't work but the stack trace looks right. Because then I wouldn't know how to go about investigating.
Let's see how next few days go.
🔍 Check for existing issues
How is Dozzle deployed?
Standalone Deployment
📦 Dozzle version
8.7.5
✅ Command used to run Dozzle
as doc
🐛 Describe the bug / provide steps to reproduce it
i have a specific container that does not load logs, is like the same infinite load as if you set a log filter that have no result, or if a container have a 0b of logs (can we fix also these 2cases and get the loading bar instead of spinner if /stream endpoint connection is fullfilled? :D)... but in this case the logs are populated just the /stream api returns empty
going backward with versions i noticed the bug popups in >=v7.0.5 while v7.0.4 correctly loads the logs of this container
for reference is what i posted here https://github.com/amir20/dozzle/issues/3222#issuecomment-2471564111
all other container on the same host shows no issue, so i assume is something in the content of logs that breaks things, (is a pretty standard nginx log output)... this is how it ends actually:
💻 Environment
-
📸 If applicable, add screenshots to help explain your bug
No response
📜 If applicable, attach your Dozzle logs. You many need to enable debug mode. See https://dozzle.dev/guide/debugging.
No response