Database: add Database.commit_context() for easier bulk transactions
Changed
Database: add a few sqlite PRAGMAs for more aggressive performance
cli.update: directly insert & delete into query_file table, instead of relying on the ORM
cli.update: bulk inserts for each query instead of one commit per file
Example
I created and filled this database in a few minutes with large queries that previously took hours to run, now a good chunk of the time is spent on fetching metadata from index nodes:
Running a similar test over a non-empty database (~1.4GB) produces no significant difference:
$ esgpull show cc2c -c
<cc2c09>
├── distrib: True
│ latest: True
│ replica: None
│ retracted: False
│ frequency: day
│ variable_id: tas, tasmax
│ variant_label: r1i*
└── <ef4f6f>
└── distrib: True
latest: True
replica: None
retracted: False
experiment_id: ssp245
$ time esgpull update cc2c -c -y
<cc2c09> -> 227623 files.
<ef4f6f> -> 6724 files.
234347 files found.
<cc2c09> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:01:49
<ef4f6f> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
esgpull update cc2c -c -y 170.65s user 4.87s system 87% cpu 3:21.09 total
Before the current PR, a non-empty database would take longer to update. Multiple reasons made it very inefficient SQL to add a new relation to a query for a file that already had existing relations to other queries. This is now a single insert in all cases, which makes it irrelevant for the database to be empty or not.
New
Database.commit_context()
for easier bulk transactionsChanged
query_file
table, instead of relying on the ORMExample
I created and filled this database in a few minutes with large queries that previously took hours to run, now a good chunk of the time is spent on fetching metadata from index nodes:
Running a similar test over a non-empty database (~1.4GB) produces no significant difference:
Before the current PR, a non-empty database would take longer to update. Multiple reasons made it very inefficient SQL to add a new relation to a query for a file that already had existing relations to other queries. This is now a single insert in all cases, which makes it irrelevant for the database to be empty or not.