Closed monaka closed 8 years ago
I've never seen that before and I have trouble imagining how it happened. Are there more log entries that follow that?
In the best case, this is fatal to Nginx, liveness probes begin failing, and k8s restarts the router pod. But I don't think a failed reload stops Nginx from serving requests using its previous configuration. (I'd have to research a bit to verify that.)
In the worst case (believe it or not), Nginx continues serving requests...
See following lines:
https://github.com/deis/router/blob/b33c18c768a6adebc0a41a5444852391d2bf4407/router.go#L53-L54
Because we're not capturing whatever errors are returned from nginx.Reload()
, the (new) computed config model becomes the "known" config model regardless of whether the reload worked or not. This would mean the router program and Nginx would no longer agree on the current state. On subsequent builds of the model (at least until something else changes), the router would believe the desired state is the current state, even though Nginx is still running old configuration. This is clearly a bug.
Note the bug doesn't explain your problem, nor will fixing it fix your problem, but it will at least prevent the worst case scenario. I will open and issue for this.
I will be reproduced on restarting, when Router detects the K8s event before booting Nginx up. I suspect this issue is rare case. It's differ from #272. But simlar issue around reloading failure.
I will be reproduced on restarting, when Router detects the K8s event before booting Nginx up.
I'm not sure I understand here. The sequence of events is always the same at startup.
There is no scenario where these things happen in a different sequence.
I'm also not sure but it may be reproduced not always (means "rare case"). Just guess but I think the sequence is like this.
@monaka your theory about what causes this relies on some false assumptions about how router works. Router does not (currently) watch the k8s event stream (although #274 proposes we start doing that).
Are you able to articulate precise steps for reproducing this issue? So far, I have not found this to be reproducible, which makes it really hard to troubleshoot.
I think it's too hard to reproduce as this must be a timing issue. And it may be already fixed by #279. (This issue may be reproduced but Router will retry for recovering)
I close this issue for now and will reopen in case my assumption from #279 is false.
I find error like this on quai.io/deis/router:v2.4.0. Is this harmless?