ReinerNippes / nextcloud_on_docker

Run Nextcloud in Docker Container on various Linux Hosts
MIT License
203 stars 48 forks source link

Nginx forwards OnlyOffice requests to wrong IP #89

Open RST-J opened 3 years ago

RST-J commented 3 years ago

Today I wanted to edit a file in Nextcloud with OnlyOffice and got a blank page and a hint saying OnlyOffice is currently not reachable (and I should contact myself [the admin]).

Issue: The final result of my research was that nginx forwarded/routed the request to the wrong IP Mitigation: Restart nginx container Question: Is there a way to ensure that the nginx container gets restarted after every update or some other way to permanently fix this issue/glitch?

Explanation/Research I looked into the browser console for further insight when I got that blank page and found an error message The resource from “https://example.com/ds-vpath/web-apps/apps/api/documents/api.js” was blocked due to MIME type (“text/html”) mismatch (X-Content-Type-Options: nosniff) I then tried to directly access that path with my browser and got a 502 Bad Gateway.

Inspecting the logs of the nginx container revealed that it tries to forward the request but cannot establish a connection: 2021/02/03 20:00:22 [error] 27#27: *23170 connect() failed (111: Connection refused) while connecting to upstream, client: a.b.c.d, server: , request: "GET /ds-vpath/web-apps/apps/api/documents/api.js HTTP/1.1", upstream: "http://192.168.160.6:80/web-apps/apps/api/documents/api.js", host: "example.com" (note the IP being 192.168.160.6)

I opened a shell in the nginx container and tried curl onlyoffice_documentserver because of the proxy_pass http://onlyoffice_documentserver; line in the nginx config. This gave me a valid response so I did a ping onlyoffice_documentserver which told me it received bytes from 192.168.160.7. My conclusion was that nginx probably cashes host name resolutions and probably never refreshes them. Because name resolution in itself obviously works based on the curl and ping output. Here I concluded that restarting the nginx container will hopefully and likely fix my problem for the moment.

Inspecting the docker containers I found out that onlyoffice_documentserver and nextcloud-db had been recreated and (re)started 4 days ago while nginx was up since 10 days. Inspecting the IPs of the containers revealed that now 192.168.160.6 is assigned to nextcloud-db and nginx didn't bother.

If restarting nginx is the solution, then is there a way to ensure that the watchtower always restarts the nginx container after any updates? Or could nginx be told to not cache its host name resolutions (although I think restarting the container is better as it saves a lot of unnecessary requests)?

wget commented 3 years ago

Currently subscribing to this bug as I do have the same IP issue but with Collabora Online :)

ReinerNippes commented 3 years ago

I hope I fixed it a0c8e77.

You may try.

RST-J commented 3 years ago

@ReinerNippes I pulled master after your last message and rerun the Ansible script/recipe with unchanged settings over the existing installation (I hope/think that is how updating the config works?). Today we had the issue that the same issue occurred for the nextcloud container itself. I cannot really tell from looking at the commit if it should have fixed the issue in general or only for the reported containers. The situation was that nextcloud was rebuilt and hence restarted but nginx not. Was that a glitch or is that case not covered yet? And if not, given the circumstances, would it be an option to generally restart nginx when anything else got updated?

(P.S.: And thanks and thanks again for that cool playbook)

ReinerNippes commented 3 years ago

@RST-J "I hope/think that is how updating the config works?" more or less. if the difference between the commits are only some container labels. last year I updated from traefik v1.7 to v2. I never tested if it's possible to update an existing installation in this case by just pulling a new version of that playbook. for sure an upgrade of postgres n to n+1 wouldn't work.

"The situation was that nextcloud was rebuilt and hence restarted but nginx not." I though that the "container links" are bidirectonal. that is to say nginx and nextcloud are restarted if one of them is changed. if that is not the case I would have to add a "watchtower.depends-on" label to the nginx container as well.