Talent-Catalog / talentcatalog

https://tctalent.org
GNU Affero General Public License v3.0
9 stars 4 forks source link

Stats response times - high cpu + memory consumptions #1054

Closed sadatmalik closed 4 hours ago

sadatmalik commented 1 month ago

Note that this went into production as a Hot Fix.

Note also that there is still a bug to be fixed. That bug has changed the priority of issue #55 which I have moved out of the IceBox into ProductBacklog

Running stats for this search results in out-of-memory errors:

https://tctalent.org/admin-portal/search/2976

camerojo commented 3 weeks ago

Problem is that very big search loads all candidates into entity cache then extracts ids. Once ids have been extracted, cached candidate entities are no longer needed and can be cleared.

We also have to replace simple call to Set[Long] searchCandidates with a paged search so that we load candidates a page at a time, extracting their ids, and entityManager.clear. That way we don't consume too much memory building up the list of candidate ids.

Best approach is to rewrite Set[Long] searchCandidates so that it uses paged normal search code (eliminating duplicated code) - conserving memory by doing sequence of paged updates - with warning on its doc that it clears persistence cache.

camerojo commented 2 weeks ago

Putting in the above change fixed the out of memory error, but the reports on searches returning large numbers of candidates still don't work because of the way we do stats on the results of specific searches (as opposed to stats on the whole database).

We hit up against an internal Postgres database limitation described here: https://stackoverflow.com/questions/49274390/postgresql-and-hibernate-java-io-ioexception-tried-to-send-an-out-of-range-inte

Bottom line, without changing the way we implement this, this means a limitation of reporting on searches with results of more than around 32,000 candidates.

I have put a HotFix change into production which avoids the memory overflow error. It also throws an exception which displays to the user on the stats page explaining this limit - and noting that we are working on a solution.

Screenshot 2024-06-13 at 3 22 04 PM
camerojo commented 2 weeks ago

Next steps

The root of the problem is that we do our stats using standard SQL strings (as @Query statements in the CandidateRepository). However, we do our searches using JPA Specifications and the Criteria API.

See #55

If we used SQL strings to construct WHERE clauses of our searches instead of Specifications, we could append those WHERE strings to the Stats SQL - so that the stats are limited by the same filters.

One way forward might be to extract the SQL string generated by the Specification and then use that in the Stats SQL - but I fear that could get messy.

The other way forward is to move away from Specifications and just constructing SQL strings from our search filters. We lose the object type safety that comes with Specifications but I question whether that is worth the complexity that Specifications introduce. I also note that with the upgrade to Spring that the JPA Specification interface is changing and needs to be reviewed and tested anyway.

fyi @sadatmalik