ItsEcholot / ContainerNursery

Puts Docker Containers to sleep and wakes them back up when they're needed
MIT License
94 stars 6 forks source link

Not proxying properly once the container is active [ARM/*arr] #37

Closed kn-f closed 1 year ago

kn-f commented 1 year ago

Hello,

When a *arr (Prowlarr, Sonarr....) container is "revived" for the second time, Container Nursery does not redirect to the application even if it's up an running properly. The debug message shows that the application returns a 401 - Unauthorized to Container Nursery and as such the proxy does not recognize the application as "up and running".

In my opinion (open to discussion) as long as Conainer Nursery receives a "valid" HTTP response from the web server (from 100 to 499?) the proxy should recognize the application as "up and running" and thus pass the control to the application itself.

I did a small test, changing this line in the ProxyHosts file: if (res.status === 200 || (res.status >= 300 && res.status <= 399)) { to if (res.status === 200 || (res.status >= 300 && res.status <= 499)) {

and it works like a charm.

An option would be to make this range configurable by the user either at ContainerNursery level or at individual container level in case widening the range is not in line with the design of the application.

ItsEcholot commented 1 year ago

This is a tough one, I was never 100% happy with the current implementation of this. The aim was to make sure the redirect only happens after the application is truly ready (so the user doesn't get a Gateway Timeout or something similar and has to reload by hand after a few seconds), which in most cases should be 200 or a redirect 300-399. I'm open to adding more status codes to the list, so we should maybe gather a list of ranges/codes which should (if the application is designed correctly) correspond with a running & ready application.

After looking through the HTTP status codes list I came up with the following:

Other suggestions?

kn-f commented 1 year ago

Based on the HTTP status codes, I'd advise to:

Said that, maybe the project could keep the original list of valid status codes with the possibility to specify an endpoint in the config file to check if the container is properly up and running. i.e. Prowlarr has an endpoint to check the server health: ​/api​/v1​/health

ItsEcholot commented 1 year ago

I'm hesitant to add the 4xx range since, you're right, this should be a client error. But some on launch building onepage applications might return 404 while the index.html isn't built yet (as an example dashy comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like 204 (No Content) should probably not lead to success?

So I think the following: 100-199, 200, 201, 300-399, 401

ItsEcholot commented 1 year ago

Regarding the health endpoint, maybe we should be thinking about adding docker healthcheck support. This way we don't need to build our own solution and increase compatibility as well.

kn-f commented 1 year ago

I'm hesitant to add the 4xx range since, you're right, this should be a client error. But some on launch building onepage applications might return 404 while the index.html isn't built yet (as an example dashy comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like 204 (No Content) should probably not lead to success?

So I think the following: 100-199, 200, 201, 300-399, 401

I wasn't considering this so definitively extending to all the 4XX codes is a "no go". With respect to the next comment, not all docker containers implement health checks as well as not all applications implement API to check the health of the service. Whatever of the two solution is chosen there will be a drawback. Personally I'd vote for using the HEALTHCHECK functionality in docker.

As a quick fix I'd currently keep the codes you mentioned before and continue the discussion/feature request to see which is the best long term approach.

nopper commented 1 year ago

I am also affected by the following check.

https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L154

In my case the check fails because some headers are missing for the request here:

https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L144

Would it be possible to expose a boolean flag to skip this additional verification and just proxy the request, like useHttpProbeCheck set to false? Or maybe another integer called treatHealthyAfterNSeconds or something similar that assumes the container is healthy if after n seconds no errors occurred.

ItsEcholot commented 1 year ago

@nopper Would this be fixed in your case if we added a config to choose between GET and HEAD requests?

nopper commented 1 year ago

I think it might, but not sure I'll have the time to check this to be honest. I moved away from containernursery and now I simply use grafana monitoring to identify and kill idle containers. For restarting I have a catch-all route in traefik that triggers a script for restarting the interested container.

ItsEcholot commented 1 year ago

Thanks for the feedback. In that case I will close here for the time being. Please feel free to reopen / create a new one if needed.