ga4gh / vrs-hackathons

Project tracking for GA4GH Variation Representation Specification hackathons
Apache License 2.0
1 stars 0 forks source link

Enhance AnyVar storage to allow for querying across variants #8

Open andreasprlic opened 2 years ago

andreasprlic commented 2 years ago

Submitter Name

Andreas Prlić

Submitter Affiliation

Invitae

Submitter Github Handle

andreasprlic

Additional Submitter Details

No response

Which event day would the project be offered?

Project Details

The anyvar project right now does not have a persistent storage tier. Would be nice to add a solution that go beyond just caching in memory. Some ideas for backend solutions are: sqlite, rocksdb.

Required Skills

No response

reece commented 2 years ago

@andreasprlic : Although the default storage is a dictionary, the README shows how to use redis with https://github.com/biocommons/anyvar/blob/master/src/anyvar/storage/redisobjectstore.py for persistence. Are you looking for something more than that?

andreasprlic commented 2 years ago

I was hoping to persist into something that is easier to search. E.g. queries like "find all variants in a certain genomic region", "what is the percentage of substitutions vs indels", etc.

larrybabb commented 2 years ago

@andreasprlic @ahwagner and @larrybabb assessed this ticket and determined the aim should be to work with others to determine a strategy for a configurable storage component (not just Redis). This will allow for more robust searching. Ideally the group would be able to demonstrate the implementation and post a Draft-PR by the end of hackathon. Q: Why this is "better" than the current version? A: Because it provides enhancement to the current limitation of a simple key-value lookup (as provided by redis)

reece commented 2 years ago

Got it. And FTR I agree with all of that.

larrybabb commented 2 years ago

@andreasprlic can you put a 1-3 minute pitch on this topic to give at the start of the day to try to entice folks to join in on this? we are planning on having a cutoff of 4 people minimum to officially work as a group on a topic. Every topic lead will pitch.