PoonLab / covizu

Rapid analysis and visualization of coronavirus genome variation
https://filogeneti.ca/CoVizu/
MIT License
45 stars 20 forks source link

Restore local database for extracted features #485

Closed ArtPoon closed 7 months ago

ArtPoon commented 1 year ago

We've been burning a lot of CPU re-aligning millions of sequences every time we process a new provision file. Even though this is pretty fast with minimap2, there is a lot of sequences! For the sake of our hardware, we should think about bringing back the database scheme implemented by @ewong347 ages ago. This is the general idea:

This should be implemented in an experimental branch

SandeepThokala commented 9 months ago

Compared timing for postgresql vs run with out database

Database Scenario 2k 10k 100k 1m
Postgres Empty 4.12 15.65 156.25 1343.14
Postgres Populated 2.13 5.0 45.72 385.24
No DB 3.28 13.29 122.69 1044.63

covizu-plot (4)

ArtPoon commented 9 months ago

Thanks, that's much better. Looks like for the lookups that we are doing, we should be sticking to SQL.

SandeepThokala commented 8 months ago
File size for database vs the number of records records file size
2k 2904 kB
10k 14 MB
100k 158 MB
1m 1408 MB
ArtPoon commented 8 months ago

Pending merge into dev