France-ioi / AlgoreaBackend

Backend for the new Algorea platform
MIT License
1 stars 2 forks source link

Fix slow trigger before_delete_items_items #1066

Closed GeoffreyHuck closed 2 months ago

GeoffreyHuck commented 2 months ago

The trigger before_delete_items_items can be very slow. It insert an entry in table results_propagate for each entry in results of a given item_id. This means the insertion of 50k rows when we remove a child of 694914435881177216.

This PR extract the inserts in table results_propagate out of the before_delete_items_items, and put it at the beginning of the process of results propagation.

Instead of doing the insert in the trigger, we add the item_id in question in a new table (results_propagate_items). Then, when we do the results propagation, we first retrieve the item_id in the new table, and do the inserts at this moment.

We do one transaction for each item_id, because we know it can take a long time, and we want to keep transactions as small as possible.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (a5b3c35) to head (e8ae6a1). Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1066 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 236 236 Lines 14260 14302 +42 ========================================= + Hits 14260 14302 +42 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

smadbe commented 2 months ago

We do one transaction for each item_id, because we know it can take a long time, and we want to keep transactions as small as possible.

The "problem" is that it can be very small for many... and not small enough for others.

Does the insert locks anything (even just results_propagate) ?

If no, there is no reason to do one transaction per item, I suppose it should even be possible to run it in a single query (which would be faster than n times a transaction+a query) If yes, and always takes <0.5s, that's ok. If yes and it may take >0.5s, it should also split in such a way that each transaction, only adds ..let's say 5k entries... so that other sync real-time (web) queries are not slowed down.

smadbe commented 2 months ago

And so could, you, instead of what you currently do in setResultsPropagationFromResultsPropagateItems loop, run the old INSERT INTO `results_propagate` SELECT `participant_id`, `attempt_id`, `item_id`, 'to_be_propagated' AS `state` FROM `results` WHERE `item_id` = OLD.`parent_item_id` ON DUPLICATE KEY UPDATE `state` = 'to_be_recomputed'; request (which was used in the trigger) but adding a LIMIT 5000 on the select and track the number of row changed (so until there is no row change) which is returned by the sql engine? It would allow being fast on items with very few results and cut the transaction even more when there are many of them.

GeoffreyHuck commented 2 months ago

Added a LIMIT...OFFSET.... The rowsAffected returned is sometimes > offset, I don't know why. To be safe, we execute the query until we get 0. This means the query will be execute one time for nothing, if that's a problem I can investigate further what happens with the number of rows affected.

smadbe commented 2 months ago

as discussed:

GeoffreyHuck commented 2 months ago

Also commented a test the was failing 2 every 3 runs on CI. With 11min for failing test, it makes me lose 20min every time I push.

I can look into this when the current emergency is over. I checked a little bit and didn't find anything wrong with it. My guess: it's just too slow on CI and it times out (it tests a lock that has 10s timeout, executing the results propagation 30 times).

Created an issue: https://github.com/France-ioi/AlgoreaBackend/issues/1067