biothings / biothings.api

BioThings API framework - Making high-performance API for biological annotation data
https://biothings.io
Apache License 2.0
45 stars 25 forks source link

Add more idiomatic sqlite multiple document insert #335

Closed ctrl-schaff closed 4 months ago

ctrl-schaff commented 5 months ago

Here's some performance metrics on a relatively small plugin of ~50k documents

<new branch>
(biothings) jschaff@tsri-ubuntu:~/workspace/biothings/pending.api/plugins/fda_drugs$ biothings-cli dataplugin upload
[15:58:38] INFO     Registering 'fda_drugs' to dump/upload managers 
           INFO     Uploading to the DB...
[15:58:39] INFO     insert time: 0.22862836300009803
           INFO     insert time: 0.06624280699907104
           INFO     insert time: 0.06812734000050114
           INFO     insert time: 0.07389820600110397
           INFO     insert time: 0.058063160000529024
           INFO     Done[0.92s] with 48022 docs
           INFO     Renaming collection 'fda_drugs' to 'fda_drugs_archive_20240506_ukQtnR1S' for archiving purpose.
           INFO     Renaming collection 'fda_drugs_temp_KACOAEWD' to 'fda_drugs'
           INFO     Cleaning old archive/temp collection 'fda_drugs_archive_20240429_Hf5y8MGK'
           INFO     Success! πŸš€
           INFO     No manifest file discovered

<old_branch>
(biothings) jschaff@tsri-ubuntu:~/workspace/biothings/pending.api/plugins/fda_drugs$ biothings-cli dataplugin upload
[16:07:46] INFO     Registering 'fda_drugs' to dump/upload managers
           INFO     Uploading to the DB...
[16:07:59] INFO     insert time: 12.146238160999928
[16:08:11] INFO     insert time: 12.159127867998905
[16:08:23] INFO     insert time: 12.154233698998723
[16:08:35] INFO     insert time: 12.135395965999123
[16:08:45] INFO     insert time: 9.772147027000756
           INFO     Done[58.93s] with 48022 docs
           INFO     Renaming collection 'fda_drugs' to 'fda_drugs_archive_20240506_YtAAjEfK' for archiving purpose.
           INFO     Renaming collection 'fda_drugs_temp_fxM8B1uu' to 'fda_drugs'
           INFO     Cleaning old archive/temp collection 'fda_drugs_archive_20240503_1qn2LgNk'
           INFO     Success! πŸš€
           INFO     No manifest file discovered

I've also uploaded a dump of the database table generated to prove that the data is identical