Closed bwalsh closed 8 months ago
Testing locally with 1000G patients
anecdotally, running ~100k variants, there is a significant difference in running pytest test_vcf_to_gnomad.py
for the first time (9.5s) vs the second time after the persistent cache (1s) for the same sample (HG00096). When running the same command for a new sample (HG00099), there is a nontrivial speedup (4.5s) as well the first time as well as the second time (1.3s).
Use cases
As a vrs-anvil user, when I run the system and need to re-start or add additional datasets, or run in different processes, the cache that stores vrs-objects or metakb keys should be available. ie I should not have to start from a fresh, empty cache.
As a vrs-anvil user, when the underlying data changes, ie new vrs-python schema or new metakb files, I need to be able to empty the cache
Potential solutions
https://github.com/grantjenks/python-diskcache