Not proxying properly once the container is active [ARM/*arr]

kn-f commented 1 year ago

Hello,

When a *arr (Prowlarr, Sonarr....) container is "revived" for the second time, Container Nursery does not redirect to the application even if it's up an running properly. The debug message shows that the application returns a 401 - Unauthorized to Container Nursery and as such the proxy does not recognize the application as "up and running".

In my opinion (open to discussion) as long as Conainer Nursery receives a "valid" HTTP response from the web server (from 100 to 499?) the proxy should recognize the application as "up and running" and thus pass the control to the application itself.

I did a small test, changing this line in the ProxyHosts file: if (res.status === 200 || (res.status >= 300 && res.status <= 399)) { to if (res.status === 200 || (res.status >= 300 && res.status <= 499)) {

and it works like a charm.

An option would be to make this range configurable by the user either at ContainerNursery level or at individual container level in case widening the range is not in line with the design of the application.

ItsEcholot commented 1 year ago

This is a tough one, I was never 100% happy with the current implementation of this. The aim was to make sure the redirect only happens after the application is truly ready (so the user doesn't get a Gateway Timeout or something similar and has to reload by hand after a few seconds), which in most cases should be 200 or a redirect 300-399. I'm open to adding more status codes to the list, so we should maybe gather a list of ranges/codes which should (if the application is designed correctly) correspond with a running & ready application.

After looking through the HTTP status codes list I came up with the following:

200
201
300-399
401

Other suggestions?

kn-f commented 1 year ago

Based on the HTTP status codes, I'd advise to:

1xx: Accept, these are informational responses implying that the container is up and running
2xx: Accept, these are successful responses, as such we're pretty sure the container is up and running
3xx: Accept, these are redirections, possibly ContainerNursery is contacting the "wrong" endpoint but the container should be up and running
4xx: Accept, these are client errors. Maybe Container Nuersery is contacting the wrong endpoint or the session that the client is supposed to be open has been forcibly closed by shutting down the container
5xx: Reject, these are server side errors implying that either the container is not fully active or that there's an error in the container

Said that, maybe the project could keep the original list of valid status codes with the possibility to specify an endpoint in the config file to check if the container is properly up and running. i.e. Prowlarr has an endpoint to check the server health: /api/v1/health

ItsEcholot commented 1 year ago

I'm hesitant to add the 4xx range since, you're right, this should be a client error. But some on launch building onepage applications might return 404 while the index.html isn't built yet (as an example dashy comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like 204 (No Content) should probably not lead to success?

So I think the following: 100-199, 200, 201, 300-399, 401

ItsEcholot commented 1 year ago

Regarding the health endpoint, maybe we should be thinking about adding docker healthcheck support. This way we don't need to build our own solution and increase compatibility as well.

kn-f commented 1 year ago

I'm hesitant to add the 4xx range since, you're right, this should be a client error. But some on launch building onepage applications might return 404 while the index.html isn't built yet (as an example dashy comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like 204 (No Content) should probably not lead to success?

So I think the following: 100-199, 200, 201, 300-399, 401

I wasn't considering this so definitively extending to all the 4XX codes is a "no go". With respect to the next comment, not all docker containers implement health checks as well as not all applications implement API to check the health of the service. Whatever of the two solution is chosen there will be a drawback. Personally I'd vote for using the HEALTHCHECK functionality in docker.

As a quick fix I'd currently keep the codes you mentioned before and continue the discussion/feature request to see which is the best long term approach.

nopper commented 1 year ago

I am also affected by the following check.

https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L154

In my case the check fails because some headers are missing for the request here:

https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L144

Would it be possible to expose a boolean flag to skip this additional verification and just proxy the request, like useHttpProbeCheck set to false? Or maybe another integer called treatHealthyAfterNSeconds or something similar that assumes the container is healthy if after n seconds no errors occurred.

ItsEcholot commented 1 year ago

@nopper Would this be fixed in your case if we added a config to choose between GET and HEAD requests?

nopper commented 1 year ago

I think it might, but not sure I'll have the time to check this to be honest. I moved away from containernursery and now I simply use grafana monitoring to identify and kill idle containers. For restarting I have a catch-all route in traefik that triggers a script for restarting the interested container.

ItsEcholot commented 1 year ago

Thanks for the feedback. In that case I will close here for the time being. Please feel free to reopen / create a new one if needed.

ItsEcholot / ContainerNursery

Not proxying properly once the container is active [ARM/*arr] #37