Closed llxia closed 3 months ago
A couple of thoughts:
Noting that I have mitigated this on the Adioptium TRSS server by rate-limiting requests on the nginx front-end, but that should be considered a temporary workaround for the underlying issues with TRSS.
A change in architecture to use a single query would definitely be preferable if possible, or at least combining them somehow so as not to overload the database. Ref: https://github.com/adoptium/infrastructure/issues/3354
This is not a database overload issue. All changes are delivered. Performance has been boosted by approximately 35 times. This issue will be closed.
Rate-limiting requests on nginx is not a way to fix performance issue. Rate-limiting requests on nginx restricts the number of requests a client can make to the server within a specified time period. This is good for mitigating issues such as brute-force attacks, but it could also block legitimate users or API calls if the limit is set too low. It requires careful tuning and monitoring to ensure that legitimate traffic is not inadvertently blocked. If you have a specific problem, please open an new issue.
This is not a database overload issue. All changes are delivered
Does that mean the problem that you've screenshotted in the original description has been resolved and we just need to get the update onto the adoptium TRSS instance?
Rate-limiting requests on nginx is not a way to fix performance issue.
I completely agree but I wasn't aware that anyone had been working on the issue - I'd be delighted if the performance issue has been fixed and I can remove the limit again :-)
Perhaps I failed to describe clearly enough in recent scrum or Slack that my intention/priority is to update the synch job (https://github.com/adoptium/aqa-test-tools/issues/856) so I can pull in the 3 recent perf improvements committed into aqa-test-tools from Lan.
I am working on it now, but took longer than expected due to recent removal of local Docker tools, and my wanting to test locally. I've finally resolved that barrier and will hopefully be able to test my updates shortly.
Noting we had 2 different issues: 1) TRSS perf 2) MongoDB container bloat
Lan has vastly improved 1) TRSS perf, but we have not pulled the changes in to our prod server yet. For 2) I am not certain I understand that bloat, but believe that regularly running the synch job will help, and adding a step in the synch job to cleanup stuff if needed is certainly possible.
1) TRSS perf, but we have not pulled the changes in to our prod server yet.
Thanks - I knew you were working on getting the sync job working again but I wasn't aware until now that it was because some of the underlying issues we'd been seeing here - that had been mitigated temporarily with the nginx "hack" - had been resolved. That's great to hear to thanks Lan!
I think for (2) we still need to understand what can be done to reduce the output (although that's separate from this issue). It would be good to know if other TRSS instances were seeing this with a default configuration to indicate if it's something we've done. A cleanup on sync might be adequate but is more of a sticking plaster (Similar to what I did with nginx!)
As we monitor more and more test builds, we need to look into TRSS query efficiency. I have seen cases where TRSS uses 100%+ to 600% CPU when loading the page.
Also, depending on the number of builds that are monitored, loading the main page can take a long time.