django-daiquiri / daiquiri

A framework for the publication of scientific databases
https://escience.aip.de/daiquiri
Apache License 2.0
26 stars 8 forks source link

BUG: very slow web interface by large number of query jobs #178

Closed kimakan closed 1 year ago

kimakan commented 1 year ago

A very large number of not archived query jobs leads to a de facto freeze of the query app. It's especially an issue for the anonymous user. The solution could be a limit on the number of query jobs. After reaching the limit (app settings), the oldest query jobs could be automatically archived. Another viable solution is to implement a pagination feature. In this case, not all query jobs will be rendered at once which will improve the overall performance.

jochenklar commented 1 year ago

Hi @kimakan @agy-why ! (still reading this :) If the API for a list is slow, it can be that database queries are performed for every row. Then you have 10000 queries for 10000 rows! Django offers 2 functions to optimize db access select_related and prefetch_related. Also there is the django-debug-toolbar which can show all database queries for a request. @triole might now a bit about this since we use it heavily in RDMO.

kimakan commented 1 year ago

The database query seem not to be the culprit of the problem. The API request to .../query/api/jobs/tables/ results in a single query that looks like this

Executed SQL
    SELECT "daiquiri_jobs_job"."id",
           "daiquiri_jobs_job"."owner_id",
           "daiquiri_jobs_job"."client_ip",
           ...
           "daiquiri_query_queryjob"."metadata",
           "daiquiri_query_queryjob"."uploads",
           "daiquiri_query_queryjob"."pid"
      FROM "daiquiri_query_queryjob"
     INNER JOIN "daiquiri_jobs_job"
        ON ("daiquiri_query_queryjob"."job_ptr_id" = "daiquiri_jobs_job"."id")
     WHERE ("daiquiri_jobs_job"."owner_id" = 1 AND "daiquiri_jobs_job"."phase" = 'COMPLETED')
     ORDER BY "daiquiri_jobs_job"."creation_time" DESC
     LIMIT 100

The problem arise if there are a lot of metadata. That's why this problem was encountered in the GAIA archive first. Some of the tables there contain more than 150 columns, hence the amount of metadata in the user tables adds up quiet fast. API request alone is fast but the rendering seems to be the issue.

kimakan commented 1 year ago

Indeed, it seems that the issue arises from the rendering of too many columns in the dropdown Columns of the query app. On my local system, daiquiri can render ~3000 columns without any issues, but at 10000 the drop in the performance is so significant that the archive becomes unusable. https://github.com/django-daiquiri/daiquiri/blob/4eab550c9e00745b5538bdb236fe93f0c346f33b/daiquiri/query/static/query/js/services/query.js#L149-L156

kimakan commented 1 year ago

The simplest solution for now is to remove the user schema from the column search, see commit 8f9acf223f73e02f230ed70529d7fe9343bbe055. I will leave some time for discussion before I make a PR for it.