biocommons / anyvar

[in development] Proof-of-Concept variation translation, validation, and registration service
https://github.com/biocommons/anyvar
Apache License 2.0
12 stars 6 forks source link

Investigate different NoSQL storage backends #25

Open jsstevenson opened 1 year ago

jsstevenson commented 1 year ago

ClinGen team found that Redis wasn't cost-effective for caching at scale. They moved to RocksDB -- we may want to consider moving our NoSQL support efforts in that direction.

holtgrewe commented 1 year ago

In case it is useful: I'm using the following utils code when bulk importing variants into rocksdb (in rust). In particular when you import all of gnomad, you will see memory usage issues unless you use hierarchical index data structures and bloom filters.

https://github.com/bihealth/rocksdb-utils-lookup

HTH

jsstevenson commented 1 year ago

@holtgrewe nice! We expect to focus on relational storage for now, but we do want to reevaluate that at some point (this may be crazy, but we'd like to maintain support for differing backends depending on whether users want to optimize for more complex searches or for pure key-value retrieval)

theferrit32 commented 1 year ago

@holtgrewe this is useful, thanks. We have hit issues before with rocksdb memory growth when using the default config and had to make similar config changes.

https://github.com/clingen-data-model/genegraph/blob/e43086b7efb393013759cecb260eb705470191e6/src/genegraph/rocksdb.clj#L21-L28

I haven't tried tweaking the bloom filter settings, but the use case above is not trying to optimize for reads, it's doing a lot of writing too.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.