biocommons / anyvar

[in development] Proof-of-Concept variation translation, validation, and registration service
https://github.com/biocommons/anyvar
Apache License 2.0
12 stars 6 forks source link

Implement asynchronous request-reply for VCF annotation #107

Open ehclark opened 4 weeks ago

ehclark commented 4 weeks ago

Is your feature request related to a problem? Please describe. Long running requests do not play nicely with cloud-based API hosting services. In general, API response times are expected to be less than 30 seconds. Long running requests unnecessarily consume network resources.

Describe the solution you'd like Implement an asynchronous request-reply pattern for annotation of a VCF file with VRS IDs using the Celery distributed task queue with Redis as a backend and broker. While the asynchronous request-reply pattern can be implemented in slightly different ways, I intend to follow the semantics defined by Snowflake for integration purposes.

AnyVar would remain deployable and functional without Redis and other async dependencies.

ehclark commented 4 weeks ago

@larrybabb @korikuzma @jsstevenson Would like to hear from you all about whether you feel this is an appropriate enhancement for AnyVar. If not, I can always implement as a separate extension/wrapper.

jsstevenson commented 4 weeks ago

also tagging @theferrit32

👍 broadly in agreement that we need something, the current VCF annotator tie-in is really more of a toy example in its present state. I think as the first mover you get to make decisions about internals, and we are a ways (to put it lightly) away from having an alternative set of requirements, so I'm not too worried about being tightly coupled to Redis vs RabbitMQ or even to a particular large data handling pattern.

Could we define this in a queueing or processing module adjacent to the storage module so that it's usable along with a Postgres instance as well as Snowflake?

ehclark commented 3 weeks ago

@jsstevenson Yes my intention would be to provide async VCF annotation support as a backwards compatible, optional add-on. The queue/backend needed by Celery would be assumed to be separate from the storage backend used AnyVar (though in theory the backend and storage could be the same for both since Celery support relational databases via Sql Alchemy). My view would be that provisioning the Celery dependencies would be the responsibility of the deployer rather than something we would package into AnyVar.