Closed xavierfav closed 6 years ago
As already discussed a bit, it is important to avoid a to high number of queries.
Django stores the performed queries in the django.db.connection.queries
list.
Adding a select_related
for querying the sound instances (to get their duration), already divides the number of queries by 2.
candidate_annotations = dataset.candidate_annotations.filter(ground_truth=None)\
.select_related('sound_dataset__sound')
An aggregation was also added in order to calculate the number of present votes from the first query. This allows to remove the queries that were done for each single candidate annotations.
Now the task seems to be faster and less DB consuming.
It would be nice to include issue numbers in commit messages (or open a pull request) so that we can track the improvements made to this problem.
Since issue number was not provided in the merge commit message, I add the commits here: a15a83a63bfb193c610febf9a64b1a6e09d54397 a73e28d502cf399fb62d012cf328d2fc0c614e28 06c7330e14158142836c858b695260a70de71801 8d509d8e2196c4f11e24cde14ea1501d0f56e3ff
It was observed in the last days that running the management command _compute_priority_score_candidateannotations from the datasets Django app was the last straw that breaks the camel's back and made the asplab-web1 server out of access.
The commands runs a celery task that basically iterates through all the candidate annotations in the FSD dataset (~700k), calculate their priority score, and update the score stored in database. You can find details about the priority score calculation in #133.
Here is the code of the celery task (
datasets/tasks.py
):Candidate annotation method for the calculation of the priority score (
datasets/models.py
):@alastair, do you thing there is something wrong, or that we could improve?