ODM2 / ODM2DataSharingPortal

A Python-Django web application enabling users to upload, share, and display data from their environmental monitoring sites via the app's ODM2 database. Data can either be automatically streamed from Internet of Things (IoT) devices, manually uploaded via CSV files, or manually entered into forms.
BSD 3-Clause "New" or "Revised" License
31 stars 8 forks source link

Extremely long pageload times #642

Closed HeatherBrooks closed 11 months ago

HeatherBrooks commented 1 year ago

Navigating from the homepage to Browse Sites or the login page, pages are taking 30-45 seconds to load each time. Also when trying to reach an individual site page from the Browse Sites page. Issue was first reported at 14:30 today by Tara Muenz who is in Delaware trying to demonstrate how to use the site. She tried multiple devices and browsers and encountered the slow pageloads across all. I've been able to replicate the issue at the Stroud Center in Firefox and Chrome.

HeatherBrooks commented 1 year ago

Still seeing these extremely slow page loads today. @aufdenkampe, any ideas?

ptomasula commented 1 year ago

After reviewing monitoring metrics we noticed an uptick in CPU utilization over the course of several months. It looks like it crossed a critical threshold about 3 weeks ago. In response to this issue we rebooted the production data server on 02/24/2023 and both CPU utilization and site performance seem to have increased.

We've not yet identified a source for the gradual CPU utilization uptick, though are still looking into it. We are also looking into automated performance monitoring. We have uptime monitoring but currently do not receive automated alerts for slow performance. Such alerts would allow us to take quicker action when site performance is degraded.

ptomasula commented 1 year ago

@HeatherBrooks, just providing an update on this issue.

LimnoTech swapped over our monitoring service so we now have page response time as an optional component on our monitoring. I set up a response time monitor so we are now collecting that data. It might take a little experimentation to determine appropriate alert thresholds, but I've presently set alert notification for a 50% increase in response time from the baseline.

HeatherBrooks commented 1 year ago

This is great, thank you!

neilh10 commented 11 months ago

@HeatherBrooks I see the page load times have greatly improved. Seems to me this could be closed. Usually its the originator that closes if they think its working :)

HeatherBrooks commented 11 months ago

Thanks!