Closed ArtPoon closed 7 months ago
Compared timing for postgresql vs run with out database
Database | Scenario | 2k | 10k | 100k | 1m |
---|---|---|---|---|---|
Postgres | Empty | 4.12 | 15.65 | 156.25 | 1343.14 |
Postgres | Populated | 2.13 | 5.0 | 45.72 | 385.24 |
No DB | 3.28 | 13.29 | 122.69 | 1044.63 |
Thanks, that's much better. Looks like for the lookups that we are doing, we should be sticking to SQL.
iss485
branch and prepare a pull request to dev
File size for database vs the number of records | records | file size |
---|---|---|
2k | 2904 kB | |
10k | 14 MB | |
100k | 158 MB | |
1m | 1408 MB |
Pending merge into dev
We've been burning a lot of CPU re-aligning millions of sequences every time we process a new provision file. Even though this is pretty fast with minimap2, there is a lot of sequences! For the sake of our hardware, we should think about bringing back the database scheme implemented by @ewong347 ages ago. This is the general idea:
This should be implemented in an experimental branch