Stuff that might be slow

First of all we should set up instrumentation to help with diagnosing what's slow and where. Flamegraphs and API metrics come to mind.

Stuff that could be causing slowness:

Network routing (do we need an edge CDN I don't think so but it's worth checking)?
Plotting code that is "over-processing" the data..doing too many transformations or useless manipulations. Need a flamegraph to really see what's being slow.
open_dataset not being cached correctly. Justin already identified this.
Data access...the disks in hank can't spin fast enough to serve the files....should hank have some kind of bcache setup where hot data is moved to flash memory?
Caching not present at the request/nginx level. If an request hits nginx that matches a previous request that return HTTP status 200, nginx should just return the cached response in most circumstances without even hitting the python server (plotting, tiles). TTL for different requests should be different. Successful plots should have the longest TTL in the cache (7 days perhaps), depths should have 7 days too perhaps since the depth values are static per dataset, timestamps should have the shortest (< 24 hours since that changes daily). See https://www.nginx.com/blog/nginx-caching-guide/ nginc caching should be instrumented for debugging purposes and the cache must live in flash memory.
We scale horizontally. This should be a last ditch effort since this is just a band aid and doesn't fix the actual underlying problems.

I don't believe the amount of datasets we offer is a bottleneck. At least it shouldn't be. If it is, that means hank's hard drive configuration is not working right.

Accessing the sqlite databases takes tens of milliseconds with nearly a million rows in one table so that's obviously not a problem lol.

I think the main issue is we're not caching things correctly because a properly configured nginx cache should be easily able to handle dozens of concurrent requests that are identical to each other. And the default view is always hit be every user when they first load the website but it's always slow so there's something fishy going on.

Worth further discussing...

https://forum.level1techs.com/t/a-bcached-zfs-pool-weekend-project-what-how-and-benches/173141

On Sun., Nov. 7, 2021, 9:12 p.m. Nabil, @.***> wrote:

First of all we should set up instrumentation to help with diagnosing what's slow and where. Flamegraphs and API metrics come to mind.

Stuff that could be causing slowness:

Network routing (do we need an edge CDN I don't think so but it's worth checking)?

Plotting code that is "over-processing" the data..doing too many transformations or useless manipulations. Need a flamegraph to really see what's being slow.

open_dataset not being cached correctly. Justin already identified this.

Data access...the disks in hank can't spin fast enough to serve the files....should hank have some kind of bcache setup where hot data is moved to flash memory?

Caching not present at the request/nginx level. If an request hits nginx that matches a previous request that return HTTP status 200, nginx should just return the cached response in most circumstances (plotting, tiles). TTL for different requests should be different. Successful plots should have the longest TTL in the cache (7 days perhaps), depths should have 7 days too perhaps since the depth values are static per dataset, timestamps should have the shortest (< 24 hours since that changes daily). See https://www.nginx.com/blog/nginx-caching-guide/ nginc caching should be instrumented for debugging purposes and the cache must live in flash memory.

We scale horizontally. This should be a last ditch effort since this is just a band aid and doesn't fix the actual underlying problems.

I don't believe the amount of datasets we offer is a bottleneck. At least it shouldn't be. If it is, that means hank's hard drive configuration is not working right.

Accessing the sqlite databases takes tens of milliseconds so that's obviously not a problem lol.

I think the main issue is we're not caching things correctly because a properly configured nginx cache should be easily able to handle dozens of concurrent requests that are identical to each other.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DFO-Ocean-Navigator/Ocean-Data-Map-Project/issues/921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSZ5OMGNCETMWOLFQHJNRLUK4MIPANCNFSM5HROANHQ .

Setting up a profiler is pretty straight forward. Flask/Werkzeug has a WSGI Application Profiler built in (docs here) that will profile every API call. I added it to runserver.py (below) but we could also add it directly to the create_app function so that it'll work with the launch-web-service script.

#!/usr/bin/env python

from oceannavigator import create_app
from werkzeug.contrib.profiler import ProfilerMiddleware

app = create_app(testing = False)
app.wsgi_app = ProfilerMiddleware(app.wsgi_app, profile_dir='profile_results')

app.run(host='0.0.0.0', port=5000, processes=4)

Once we have a bunch of profiles we can visualize them as flame/icicle graphs with SnakeViz or Tuna. SnakeViz is supposed to integrate nicely with Jupyter Lab but I'm not having much luck from that. Tuna was pretty easy to get going with though. Here's a screen shot of a tile call in Tuna:

Let's set this up on the staging container and go through one of the testing scenarios done for a release.

On Tue., Nov. 9, 2021, 9:27 a.m. JustinElms, @.***> wrote:

Setting up a profiler is pretty straight forward. Flask/Werkzeug has a WSGI Application Profiler built in that will profile every API call. I added it to runserver.py (below) but we could also add it directly to the create_app function so that it'll work with the launch-web-service script.

!/usr/bin/env python

from oceannavigator import create_app from werkzeug.contrib.profiler import ProfilerMiddleware

app = create_app(testing = False) app.wsgi_app = ProfilerMiddleware(app.wsgi_app, profile_dir='profile_results')

app.run(host='0.0.0.0', port=5000, processes=4)

Once we have a bunch of profiles we can visualize them as flame/icicle graphs with SnakeViz or Tuna. SnakeViz is supposed to integrate nicely with Jupyter Lab but I'm not having much luck from that. Tuna was pretty easy to get going with though. Here's a screen shot of a tile call in Tuna:

[image: image] https://user-images.githubusercontent.com/79917349/140928175-995e41d3-aad5-497e-b3ba-afd144ab5b7f.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DFO-Ocean-Navigator/Ocean-Data-Map-Project/issues/921#issuecomment-964126876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSZ5OKQVTZNWQX6MXNY2G3ULELDTANCNFSM5HROANHQ .

I just created the Performance-Testing branch with the WSGI Profiler added to the create_app method. Now the API requests should be profiled regardless of how you launch the Navigator. You can set that up in a staging container to test with if you'd like.

If we could install SnakeViz or Tuna in the staging container and create a Jupyter Lab Notebook that we could access remotely we might be able to quickly view the results of the profile data as it's created.

Let's set this up on the staging container and go through one of the testing scenarios done for a release. … On Tue., Nov. 9, 2021, 9:27 a.m. JustinElms, @.***> wrote: Setting up a profiler is pretty straight forward. Flask/Werkzeug has a WSGI Application Profiler built in that will profile every API call. I added it to runserver.py (below) but we could also add it directly to the create_app function so that it'll work with the launch-web-service script. #!/usr/bin/env python from oceannavigator import create_app from werkzeug.contrib.profiler import ProfilerMiddleware app = create_app(testing = False) app.wsgi_app = ProfilerMiddleware(app.wsgi_app, profile_dir='profile_results') app.run(host='0.0.0.0', port=5000, processes=4) Once we have a bunch of profiles we can visualize them as flame/icicle graphs with SnakeViz or Tuna. SnakeViz is supposed to integrate nicely with Jupyter Lab but I'm not having much luck from that. Tuna was pretty easy to get going with though. Here's a screen shot of a tile call in Tuna: [image: image] https://user-images.githubusercontent.com/79917349/140928175-995e41d3-aad5-497e-b3ba-afd144ab5b7f.png — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#921 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSZ5OKQVTZNWQX6MXNY2G3ULELDTANCNFSM5HROANHQ .

DFO-Ocean-Navigator / Ocean-Data-Map-Project

Stuff that might be slow #921

!/usr/bin/env python