FusionAuth / fusionauth-containers

Container definitions for docker, kubernetes, helm, and whatever containers come next!
https://fusionauth.io/
219 stars 68 forks source link

FA time outs #83

Closed nick-kostov closed 2 years ago

nick-kostov commented 2 years ago

Hi everybody, We have Fusionauth installed on EKS and we have an LB pointing towards CF. We are experiencing time outs from time to time with the admin page. I was wondering if this could be a tomcat problem since I noticed that this is its web-server. Logs are telling us nothing.

This is the version that we use: fusionauth/fusionauth-app:1.30.1

Looking forward your replies.

mooreds commented 2 years ago

That's a pretty old version of FusionAuth (almost a year old).

Is there anything happening in the FusionAuth or LB logs when the timeouts occur?

nick-kostov commented 2 years ago

Hello and thank you for your reply: These are the errors that I get: 2022-08-02 2:49:33.982 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-02 2:52:51.243 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-02 2:54:31.879 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-02 2:55:01.782 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-02 6:42:25.570 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 12:00:34.816 AM INFO org.apache.coyote.http11.Http11Processor - The host [${ip}] is not valid Note: further occurrences of request parsing errors will be logged at DEBUG level. 2022-08-03 6:24:09.221 AM WARN org.primeframework.mvc.action.DefaultActionMappingWorkflow - The action class [io.fusionauth.app.action.IndexAction] does not have a valid execute method for the HTTP method [POST] 2022-08-03 6:49:19.964 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 7:06:41.297 AM INFO com.inversoft.search.ElasticRestClientHelper - Connecting to Elasticsearch at [http://fusionauth-search:9200] 2022-08-03 7:31:43.783 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 9:26:13.191 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 9:47:25.077 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 12:56:37.712 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 2:02:06.737 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 2:05:14.146 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 2:05:44.458 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 3:04:14.384 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 3:10:22.335 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 3:10:22.444 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 3:12:26.327 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 3:31:43.958 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 3:34:42.396 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-03 6:34:01.496 PM INFO org.apache.coyote.http11.Http11Processor - The host [${ip}] is not valid Note: further occurrences of request parsing errors will be logged at DEBUG level. 2022-08-03 10:07:41.428 PM WARN org.primeframework.mvc.action.DefaultActionMappingWorkflow - The action class [io.fusionauth.app.action.IndexAction] does not have a valid execute method for the HTTP method [POST] 2022-08-03 10:10:57.155 PM WARN org.primeframework.mvc.action.DefaultActionMappingWorkflow - The action class [io.fusionauth.app.action.IndexAction] does not have a valid execute method for the HTTP method [POST] 2022-08-03 10:28:59.573 PM WARN org.primeframework.mvc.action.DefaultActionMappingWorkflow - The action class [io.fusionauth.app.action.IndexAction] does not have a valid execute method for the HTTP method [POST] 2022-08-04 7:51:29.118 AM WARN org.primeframework.mvc.action.DefaultActionMappingWorkflow - The action class [io.fusionauth.app.action.IndexAction] does not have a valid execute method for the HTTP method [OPTIONS] 2022-08-04 7:55:32.453 AM WARN org.primeframework.mvc.action.DefaultActionMappingWorkflow - The action class [io.fusionauth.app.action.IndexAction] does not have a valid execute method for the HTTP method [OPTIONS] 2022-08-04 8:12:17.486 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 9:12:37.435 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 9:46:00.743 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 1:21:26.103 PM INFO org.apache.tomcat.util.http.Parameters - Character decoding failed. Parameter [q] with value [%plate%] has been ignored. Note that the name and value quoted here may be corrupted due to the failed decoding. Use debug level logging to see the original, non-corrupted values. Note: further occurrences of Parameter errors will be logged at DEBUG level. 2022-08-04 1:46:18.498 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 1:51:07.394 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 1:51:26.017 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 2:29:19.360 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 2:31:38.234 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 2:40:08.554 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 2:41:13.419 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-04 3:09:34.285 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 12:16:40.897 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 12:56:30.262 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 4:22:59.271 AM INFO org.apache.coyote.http11.Http11Processor - The host [${ip}] is not valid Note: further occurrences of request parsing errors will be logged at DEBUG level. 2022-08-05 4:51:39.340 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 8:07:08.977 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 9:03:28.833 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 11:32:18.398 AM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out. 2022-08-05 12:54:38.308 PM ERROR io.fusionauth.api.service.cache.DistributedCacheNotifier - Failed to request a cache reload for [http://10.0.20.227:9011]. Exception message: Read timed out.

mooreds commented 2 years ago

Thanks! Can you replicate the issue with 1.36.8?

nick-kostov commented 2 years ago

I do not think that this is possible at the moment, since we have to upgrade it in a production EKS. I can get a response to that in a week. Can we keep this open until I am able to schedule such change in the systems. BR, Nicky Kostov

mooreds commented 2 years ago

Sure. It would be best done in a testbed, because, well, production. I'm not positive it will fix the issue, but it'd be helpful information. FYI, fixes for issues like this are not typically backported to old releases.

nick-kostov commented 2 years ago

Hi @mooreds, I have spoke with the DEV team regarding the problem. They requested to explain to you that only the dashboard is only timing out when there are XXXX amount of users and YYYY amount of apps in the system. The time outs do not look that random. Also, is there a way with which we could tune the time outs in tomcat in order to adjust it properly.

nick-kostov commented 2 years ago

I will be starting the upgrades this week and will get you feedback regarding: 1.36.8

mooreds commented 2 years ago

Hi @nick-kostov

Just FYI, it may make sense to go directly to 1.37.2, which is the latest released version: https://fusionauth.io/docs/v1/tech/release-notes

(1.37 switched to netty from tomcat, fyi.)

You can look at the configuration options for any timeout settings: https://fusionauth.io/docs/v1/tech/reference/configuration

It's worth noting that how the timeouts occur depend on how you are using FusionAuth. For example, if you have a connector set up, the response time for that connector will impact FusionAuth's response time.

If you have a support contract, it'd be worth opening a ticket as well: https://account.fusionauth.io/account/support

nick-kostov commented 2 years ago

Hi @mooreds, Thanks for your help and suggestions. We have upgraded FA to 1.38.1 Everything is working as expected, as we would say it works even better than before.

Just one last question, when running FA in Amazon Kubernetes Service or Kubernetes at all do you suggest we put 2 replicas for production workloads and should it go on a separate worker group.

mooreds commented 2 years ago

Glad you figured out the issue and things are running smoothly!

Just one last question, when running FA in Amazon Kubernetes Service or Kubernetes at all do you suggest we put 2 replicas for production workloads and should it go on a separate worker group.

This feels like a support question :) . For technical support, you have two options:

More on our technical support options: https://fusionauth.io/docs/v1/tech/admin-guide/technical-support

Finally, please close the ticket if you aren't seeing the timeouts any more.