progressively increasing and unsustainable RAM usage

ivansrbulov commented 3 days ago

Using sysstat's sar I have been able to plot usage of RAM of this tool on my Ubuntu server dedicated to only hosting this dashboard. As can be seen in the chart below, progressively more RAM is used until the server eventually becomes unresponsive and requires a restart:

From 3pm to 11pm local time, system RAM utilisation has gone up from 51% to 71.5%

At this level of utilisation, I have also noticed weird things starting to happen on the dashboard. Trains are particularly affected, the charts showing travel times are pretty much broken, showing every journey and station travel time except two as 0 minutes:

The only processes ongoing on the server are this dashboard, sar, and nginx as I am using it only to redirect port 80 to port 3000 which utilises less than 1% of RAM combined. Screenshot of server's htop which shows the significant usage:

featheredtoast commented 3 days ago

Hey Ivan thanks for the report. I'll look into it. Looks like the SQL delete query may be suspicious as well.

On Sat, Nov 9, 2024, 3:14 PM Ivan Srbulov @.***> wrote:

Using sysstat's sar I have been able to plot usage of RAM of this tool on my Ubuntu server dedicated to only hosting this dashboard. As can be seen in the chart below, progressively more RAM is used until the server eventually becomes unresponsive and requires a restart: Figure.2024-11-09.230202.png (view on web) https://github.com/user-attachments/assets/513a18b2-f1d5-4ca0-b1a0-ce936e8d1240

From 3pm to 11pm local time, system RAM utilisation has gone up from 51% to 71.5%

At this level of utilisation, I have also noticed weird things starting to happen on the dashboard. Trains are particularly affected, the charts showing travel times are pretty much broken, showing every journey and station travel time except two as 0 minutes: image.png (view on web) https://github.com/user-attachments/assets/9958a9cc-23d2-4cf8-a653-e37e1de12139

The only processes ongoing on the server are this dashboard, sar, and nginx as I am using it only to redirect port 80 to port 3000 which utilises less than 1% of RAM combined. Screenshot of server's htop which shows the significant usage: image.png (view on web) https://github.com/user-attachments/assets/ebc1cd3f-a0cc-4101-a197-f7827c5df534

— Reply to this email directly, view it on GitHub https://github.com/featheredtoast/satisfactory-monitoring/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKC4JTOKIDWGKX65PDXM53Z72JNBAVCNFSM6AAAAABRPSGGHOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2DMNRUGA3DKOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ivansrbulov commented 3 days ago

No problem at all, thanks for looking into it! Let me know if there's anything I can do to help. And yes, well spotted on the DELETE process.

featheredtoast commented 2 days ago

Hi there, I've inspected it and looks like there was some unclosed http requests that are now handled and I'm now getting no sawtooth on the golang apps - Can you pull the latest try again?

ivansrbulov commented 2 days ago

Have done, will come back in a couple of hours with the same info as above to see how things are looking!

ivansrbulov commented 2 days ago

Nearly after 12 hours of running, it seems the fixes have worked for RAM:

But it also seems that the delete query continues to possibly be an issue? It's not possible to see this from this screenshot, but either cores seem to be at 100% utilisation, alternating between them, or some combination with the delete query always being top.

If helpful I can reset and track CPU usage.

featheredtoast commented 2 days ago

wow thanks for extensively confirming the memory fixes at least - I'll see if I can't dig to the bottom of the big delete query, but any little bit of hints helps here. Nothing obvious jumps out to me so far, but I'll check some more

ivansrbulov commented 1 day ago

No problem, was very easy to do!

On the CPU, something very weird is happening. The chart tracks the CPU% utilisation for the top 10 processes.

The Grafana spikes are when I access the dashboard, so that is not concerning / surprising.

htop:

Full .log in case you want to see it yourself. Goes crazy at 13:30. cpu_usage_top10.log

featheredtoast commented 1 day ago

alrighty, I've dug into the queries, and believe I found the issue, was doing a dumb method to truncate history metrics - would you mind giving the latest another go?

Sidenote, I'm also now building docker images, so you may have to also run docker compose pull in addition to docker compose down to get the latest.

ivansrbulov commented 1 day ago

Thanks for the sidenote, I followed that process and it looks like everything works so I'll close this. Thanks for a quick fix!

featheredtoast / satisfactory-monitoring

progressively increasing and unsustainable RAM usage #4