Open monumahali opened 4 years ago
Hi @monumahali !
Can you provide more information about your setup? Is your code open source? Do you see anything in the logs? Are you sure that the FMD can find the configuration file at startup?
Cheers, M.
P.S. Did you see the survey? We'd be thankful if you filled it in :) https://docs.google.com/forms/d/10sOStXH13FoH45_xZaEwHEFWzRCDsr5gTgPWCdsdxp0/edit
Hi @mircealungu
No, it's not open source. But think of it as a simple Flask application. The only problem is that it doesn't update it when the Flask app is started using the supervisor but works fine if the app starts with gunicorn. I have not checked the supervisor logs but will check and revert. No, I am not sure if FMD finds the configuration file, also I don't understand how and where to check this but you guys can definitely help me on this.
Thanks, Monu
Dear @monumahali ,
One thing that might be relevant is that the FMD tries to load it's
configuration from the file that's specified in the environment
variable FLASK_MONITORING_DASHBOARD_CONFIG
.
However, unless you provide steps to help replicate your problem on my machine I can't help (I don't know what server you;re using instead of gunicorn, nor how is that configured).
Maybe you can create a toy GitHub project that illustrates the problem? Then I'll be happy to help!
Cheers, And good luck, M.
Hi @mircealungu,
You can just create a simple flask app with FMD and try to run it with supervisor on top of gunicorn. supervisor is essentially used to start the server whenever it is down automatically.
If your flask app runs on supervisor and FMD is also updating and working fine then I can understand that supervisor is not the culprit.
I've never used Gunicorn, nor Supervisor. So unless you can share step by step what should I do to replicate the problem (e.g. what config should I use with the two of them, etc.), I don't think I can replicate your error .
Same issue is being faced by me. I am also running flask app using supervisor and gunicorn.
Steps to replicate the issue with gunicorn and supervisor:
Create a file wsgi.py
from app import create_app
application = create_app('production')
if __name__ == "__main__":
application.run(debug=True)
Then install gunicorn using - pip install gunicorn
and try to run the app with gunicorn using command -
gunicorn -b 0.0.0.0:5000 wsgi:application
After that you use some of the endpoints and then go to http://localhost:5000/dashboard/login
Ater you login to the FMD you click on the "Median request duration (ms)"
And it shows 0 for all.
Attaching screenshot
it is not capturing the duration of the request. Though it captures the number of requests
@FlyingBird95 - Congratulations on the new release of 3.1.0 Please add support for gunicorn too. We are not able to monitor the apis since the production code uses gunicorn. And this is the case with most of the flask production apps
@mircealungu Please check my comments.
@RohitShende -- Thanks for the headsd up Rohit. We're aware of the gunicorn issues, but momentarily we don't have the engineering power to move quickly in that direction. I'll ping you when it's solved. If there was a paid version of FMD maybe we could afford to move faster. (BTW, would {you | your company} {be interested | afford} paying a small {monthly | yearly} fee for a service like FMD?
I created a basic Flask + gunicorn + Flask Monitoring Dashboard application: https://github.com/alexander34ro/flask-monitoring-dashboard-gunicorn
I deployed it on Heroku: https://flask-monitoring-dash-gunicorn.herokuapp.com/
Checking the dashboard, the Median Request Duration is definitely not updating, even though the Number of Hits tab functions correctly.
Median Request Duration
Number of Hits
Hi everyone! I finally got the chance to look into the bug. The duration is recorded in the database, but the frontend doesn't display it when it's a too small of a number. Quick ways to verify this:
Create a new endpoint with a time.sleep(1)
in it. The median duration will be displayed correctly.
Run the following query against the FMD database:
select e.name, r.time_requested, r.duration from Request as r, Endpoint as e where r.endpoint_id=e.id and e.name="your_endpoint_name"
This should show some very small non-zero values for the duration.
So the bug is real, but it seems to be a frontend issue and not a gunicorn/supervisor/data collection issue.
Disclaimer: I didn't use supervisor in my setup, but I'd be very surprised if that causes any issues, since supervisor is only a Unix tool that makes sure the process is restarted if it crashes.
@bogdanp05
We looked yesterday a bit with @alexander34ro at this problem. One observation is that, if you enable Monitoring level 3 for the hello-world endpoint this is what you get:
It seems to me that gunicorn does some program transformation magic and moves the entire body of the method in a separate thread. And since FMD doesn't handle parallel threads, this is why the times are always 0 or close to zero?
@mircealungu I see. My previous comment was strictly about the overview page.
For the profiler page, are we sure gunicorn is the cause? Does running the app via the Flask development server produce a different output? Looking at the screenshots of the hellow_world
endpoints, it looks like the total duration is close to 0 ms, which is less than the profiler's sampling time, so nothing would be displayed regardless of the web server used.
I do not feel that the low duration is the cause. In my case it is taking around 6 sec i.e. 6000 ms for the endpoints even then it is showing 0 when using app with gunicorn
and if we use python run.py
then it shows the durations. Check the images attached in my previous comments.
Attaching images of production vs local app
python run.py
gunicorn -k gevent -w 10 --worker-connections 1000 --certfile=/etc/ssl/certs/xyz.cer --keyfile=/etc/ssl/certs/xyz.key -b localhost:8080 --log-file gunicorn.log --log-level debug wsgi directory=/data/flask-app
Thanks for the extra screenshots and especially for the gunicorn command @RohitShende! I was running gunicorn -w 4 localhost:5000 main:app
and I just couldn't replicate it. I'll try again later using your command.
I updated the toy app with a 5 seconds delay and I tried to use a command as similar as possible to what @RohitShende is using.
web: gunicorn -k gevent -w 3 --worker-connections 1000 --log-file gunicorn.log --log-level debug wsgi:app
I redeployed the app and the median time is correct.
Here you can find the repo for the toy app: https://github.com/alexander34ro/flask-monitoring-dashboard-gunicorn And here is the dashboard running on Heroku: https://flask-monitoring-dash-gunicorn.herokuapp.com/dashboard
Hi,
After the above comment. I too tried to replicate the issue using this app https://github.com/alexander34ro/flask-monitoring-dashboard-gunicorn
But I failed to replicate the issue
Then I made this app same as my app using the same config file
The config file
flask-monitoring-config.cfg
[dashboard]
APP_VERSION=1.0
CUSTOM_LINK=dashboard
MONITOR_LEVEL=3
OUTLIER_DETECTION_CONSTANT=2.0
ENABLE_LOGGING=True
[authentication]
USERNAME=admin
PASSWORD=admin
GUEST_USERNAME=guest
GUEST_PASSWORD=['dashboardguest!', 'second_pw!']
SECURITY_TOKEN=cc83733cb0af8b884ff6577086b87909
[database]
DATABASE=sqlite:///flask_monitoringdashboard.db
[visualization]
COLORS={'main':'[0,97,255]', 'static':'[255,153,0]'}
I modified the code in app.py like below
app = Flask(__name__)
dashboard.config.init_from(file='flask-monitoring-config.cfg')
dashboard.bind(app)
And I was able to replicate the issue in hello_world app
With logging enabled I see these logs as well
Can't get the stacktrace of the main thread. Stopping StacktraceProfiler
Thread to monitor: 139829990007368
Running threads: dict_keys([139830678750976])
Can't get the stacktrace of the main thread.
The command I used to run the app using gunicorn was : gunicorn -k gevent -w 2 --worker-connections 10 -b 0.0.0.0:4000 --log-file gunicorn.log --log-level debug wsgi:app
Surprisingly when I use the below flask command to run the same hello app with same config file, it works fine and captures the result. It means there is some issue with Gunicorn + Some config that I am passing in the above config
export FLASK_APP=wsgi.py
flask run -h 0.0.0.0 -p 4444
On debugging more I found out the combination is
MONITOR_LEVEL=3 + Gunicorn
I changed it to MONITOR_LEVEL=2
and I was able to see the median time getting captured correctly.
I still saw the error log
Can't get the stacktrace of the main thread.
Also observed that MONITOR_LEVEL=1 + Gunicorn
works nicely and there is no error in the logs.
This confirms why it was working for @alexander34ro since the default MONITOR_LEVEL is 1 and it is also seen in the screenshot that he attached
Hey all,
First of all, thank RohitStende for your analysis. This has helped me to better identify the exact problem. Let me first expain what we do in the FMD. We start by wrapping the endpoint based on monitoring-level specific functions. We use the following classes for this:
BaseProfiler
PerformanceProfiler
OutlierProfiler
StacktraceProfiler
The first two profilers, are not actually profilers, because they only run to update the data in the database. The last two (level 2 and 3) are started before the request is handled, and are monitoring the stracktrace of the endpoint. Thus, when processing a request (that is profiled by either level 2 or 3), we're always dealing with two threads. And this is exactly what is going wrong using gunicorn/supervisor.
I've taken a look at the architecture of Gunicoorn. They mention a "pre-fork worker model". Since I didn't exactly know what this is, I looked at stack overflow. One of the answers explains that A fork is a completely separate *nix process.
Thus, the exact problem is that the our profilers cannot get the stacktrace of the main thread. They do this, using the following snippet:
try:
frame = sys._current_frames()[self._thread_to_monitor]
except KeyError:
log('Can\'t get the stacktrace of the main thread. Stopping StacktraceProfiler')
log('Thread to monitor: %s' % self._thread_to_monitor)
log('Running threads: %s' % sys._current_frames().keys())
traceback.extract_stack(frame)
My guess is that the sys._current_frames()
can only get the frames (i.e. threads) in the same *nix process. Operating Systems do this for security reasons, otherwise you a process can monitoring any process on the same computer. N.B. I'm not 100% about the claims in this paragraph, but is seems reasonable.
My second claim would be that when a thread is spanned in gunicorn, it is handled in a different *nix process. I'm actually not entirely sure about this claim, but it would connect all the info mentioned above.
I would like to ask someone to verify my two claims that I've made above. When this is indeed the case, we can think about how to overcome this issue (maybe by using a different profiler, that doesn't run into a different thread).
Kind regards,
Patrick
I'm trying to implement a patch, here's what chatGPT said about implementation
"When Gunicorn starts, it initializes the main process and then forks worker processes. Because of the nature of process forking, the worker processes don't share memory with the parent process or with each other, so anything initialized in the main process isn't available in the worker processes.
Gunicorn provides hooks that you can use to run code during certain parts of the server's lifecycle, and these hooks can be useful for solving problems related to worker initialization. The post_fork
hook in particular is run after a worker process has been forked, and you can use it to reinitialize your module in each worker process.
Here's an example of how to do this. Suppose the module you want to initialize is named mymodule
and has an initialize()
function:
def post_fork(server, worker):
import mymodule
mymodule.initialize()
You would add this function to your Gunicorn configuration file. If you don't already have a configuration file, you can create one. For example, you could create a file named gunicorn_config.py
:
def post_fork(server, worker):
import mymodule
mymodule.initialize()
bind = '127.0.0.1:8000'
workers = 4
In this configuration file, bind
specifies the IP address and port that Gunicorn should bind to, and workers
specifies the number of worker processes.
Then you would start Gunicorn with the --config
or -c
option to specify the configuration file:
gunicorn -c gunicorn_config.py myapp:app
In this command, myapp:app
would be replaced with the import path to your actual WSGI application.
This way, each worker process will run the post_fork
function after it's forked, and the initialize()
function in your module will be called in each worker process, allowing the module to function correctly in the worker processes."
More chatGPT insight
"When Gunicorn forks worker processes, each worker gets its own separate instance of the Flask application. These application instances are not created until after the workers have been forked, so they aren't directly available in the post_fork
function. However, you can access the Flask application in each worker process once it starts handling requests.
If you need to perform some initialization on the Flask application in each worker process, one way to do this would be to use a middleware that performs the initialization the first time each worker handles a request.
Here's an example of what this could look like:
class InitializeAppMiddleware:
def __init__(self, app):
self.app = app
self.initialized = False
def __call__(self, environ, start_response):
if not self.initialized:
# Perform your initialization here.
# This could involve calling a method on self.app,
# or modifying self.app in some way.
self.initialized = True
return self.app(environ, start_response)
You would then wrap your Flask application with this middleware:
from flask import Flask
from mymodule import InitializeAppMiddleware # Import the middleware
app = Flask(__name__)
app.wsgi_app = InitializeAppMiddleware(app.wsgi_app)
With this setup, the first time each worker process handles a request, it will run the initialization code in the middleware. This allows you to perform per-worker initialization that involves the Flask application.
Please remember to take thread-safety into account if you're using the gthread
worker type, as the middleware could be accessed by multiple threads at the same time. In this case, you might need to use a lock to ensure that the initialization only happens once."
I am using the Flask monitoring dashboard for monitoring purposes. The issue comes when I run the flask app through Gunicorn, Flask monitoring dashboard works fine and normal but when I run it with the supervisor Flask monitoring dashboard is not updating i.e it's not monitoring anymore.