django-daiquiri / daiquiri

A framework for the publication of scientific databases
https://escience.aip.de/daiquiri
Apache License 2.0
26 stars 8 forks source link

BUG: scrub valid user tables if tablename used previously #247

Open kimakan opened 2 months ago

kimakan commented 2 months ago

The main purpose of the management command scrub_user_tables is to check whether there are any user in the database where the corresponding query job is not in the phase COMPLETED or EXECUTING. The found tables can be deleted if the flag --delete is used.

The selection of tables happens here https://github.com/django-daiquiri/daiquiri/blob/f980bc9e47e4ded6e70374a83c5cbd6fd504b8d5/daiquiri/query/management/commands/scrub_user_tables.py#L33-L40

The issue occurs if the table name is used several times after archiving previous tables. In this case, there are several jobs using the same table name where only one of the jobs has the valid phase. It's not guaranteed that during the filtering we get the job with the correct phase.

                job = QueryJob.objects.filter(
                    schema_name=schema_name,
                    table_name=table['name']
                ).first()

Proposed solution I think that ordering of the result by creation date before executing first() might solve the issue. Only the most current queryjob using the table name must be valid.