Capture timings of requests

gwarmstrong commented 4 years ago

See discussion in #26 .

It would be nice to register some hooks with flask before_request and after_request that register timing information about the request:

before_request:

request ID (likely using uuid.uuid4)
start time
- Take into consideration time-zones etc...
url for request (difference between endpoint, and parameters)
number of samples being requested (maybe not applicable everywhere)
API version

after_request:

end time
- timing considerations should be the same
maybe some job information, such as number of samples, that is not known until complete

See this helpful SO post for a quick overview on the flask.request module, which should provide a lot of utilities for getting the above information.

Also see the flask.Request docs.

wasade commented 4 years ago

the request should already have some type of unique identifier associated with it. I'm not sure if a session ID or an etag or what would be best here -- @dhakim87 almost certainly knows though :)

gwarmstrong commented 4 years ago

After searching around the flask docs and SO/other blogs for a bit, I have seen no indication that flask would generate unique identifiers for requests on its own, and have seen several indicators of the contrary 1. 2. 3..

I am not sure that a session ID is appropriate if we want a unique ID for the request (though may be useful additional information), as AFAIK the session ID would be shared across all transactions in a single session.

I am also not super familiar with etag's but they seem to correspond to resources, more so than requests. Not sure if that would be appropriate here.

I would probably be tempted to take an approach most similar to 2. from above.

gwarmstrong commented 4 years ago

Happy to hear @dhakim87 's input.

wasade commented 4 years ago

Ah interesting, thanks for the investigation. I think the second link there looks like it may have some meat to adapt from. But, again, useful but I think lower prio at the moment

dhakim87 commented 4 years ago

I'll be a little disappointed if we have to roll our own profiling tools. I expected Flask would have some decent profiler support. I believe Werkzeug has a ProfilerMiddleware module we can use as a one liner https://werkzeug.palletsprojects.com/en/1.0.x/middleware/profiler/

There's also someone asking turning flask profiling into call graphs, which may or may not use the same set of tools https://stackoverflow.com/questions/21828672/profiling-and-finding-bottleneck-of-a-flask-application-current-respond-time

I think this ticket spawned in part from wanting to trace down a particular deadlock we thought was related to not having async support. For that use case, knowing which requests have not been responded to at the time the deadlock occurs is sufficient (whereas delaying writing to the profiler file until after_request is called is insufficient to track this use case).

In terms of what we need, I agree with everything on your list, but this feels like the type of task that slowly grows over time rather than something we'll have a perfect design for up front. Having a uuid is nice, as is knowing the the url path (I'm torn on whether to include query parameters or not - Could be helpful for debugging, but this is the type of vacuum user credentials get sucked into and accidentally leaked later, maybe that problem doesn't exist while this is restricted to public api though), as well as the HTTP Status code returned.

In terms of tools I've seen in the past: Live monitoring of # of http status codes by time (Usually split as 2XX, 3XX, 4XX, 5XX) CPU usage/Memory usage graph on the server Per request start time, end time, elapsed cpu time, elapsed wall clock time - This type of table is usually restricted to database layer commands though, which admittedly is where the bottleneck should be in your standard web app.

But to reiterate, I'd really prefer we use something off the shelf before diving into what could turn into a very large rabbit hole (In the best case, we never write any bugs and this never gets used!)

wasade commented 4 years ago

Good spot on the profiler, I think that's certainly worth looking at. And agree, better to use something that already exists here if possible

biocore / microsetta-public-api

Capture timings of requests #28