Open dantebarba opened 1 month ago
The configuration seems to omit a possible relevant dependency (GRPC_POLL_STRATEGY
, libffi-dev
and whats up with the EoL Python & pip version?) - probably worth a shot bisecting dependencies to rule out a loaded C module is misbehaving.
libffi-dev
The GRPC configuration is due to the following issue with grpcio: https://github.com/grpc/grpc/issues/29044. We use GRPC to connect to GCP services. Libffi was added to support cffi and cryptography packages.
The main issue resides on the fact that on all my local environments the docker image runs perfectly fine. My first assumption was some kind of firewall issue with Cloudflare or our load balancer but it was quickly ruled out since during the stress test if I login into the VM and do a simple curl localhost
the application would not respond. So there is nothing blocking the requests. We also have an external redis instance running on GCP but that shouldn't be an issue since the test call doesn't even interact with the cache.
This is a sample from my current local machine. Same results were achieved (but with less performance) on an M1 laptop. Hanging issues only occur on VM.
VM memory when non-responsive (I can login via ssh though without any issues, even login into the container)
total used free shared buff/cache available
Mem: 1982 1165 317 2 499 673
Swap: 0 0 0
Update: switched back to flask development server, did a couple of stress tests and aside from a rate limit ban I didn't have any requests or performance issue.
Fun fact, since flask can process as much as 3 times more requests than gunicorn it made the load balancer rate limiter to kick in.
Hi
I've been dealing with this issue since we moved our application from flask development application server to wsgi and I'm unable to find a solution to it
Runtime environment
Dockerfile
Description
We started experiencing some random hangs on the application. We noticed because our uptime monitor would alert us. Downtime usually lasts about 3-5 minutes. We analyzed the logs and found that usually these hanging events are preceded by a request spike.
Our first attempt was to change the worker and threads configuration. We tested various combinations from 1 worker and 1 thread to 8 workers and 2 threads, all of them reported similar issues when doing stress tests. The one that was configured with 1 worker and 1 thread was the fastest to freeze, after only 10 requests.
One of the things that we noticed was that the application would return to life after bursting a bunch of
[DEBUG] Closing connection.
log entries.This issue only happens when deploying to a VM, on my local environment (Macbook Air M1) this does not happen, the application can serve multiple requests and all stress tests were successful.
Here is a stress test sample
Any thoughts?