DataBiosphere / data-store

AWS and GCP data storage system for genomic data.
https://dss.dev.ucsc-cgp-redwood.org
Other
3 stars 2 forks source link

Create eventing documentation #78

Open melainalegaspi opened 4 years ago

melainalegaspi commented 4 years ago
chmreid commented 4 years ago

The point of this issue is to create/update documentation around how to use eventing and flash-flood to search the data store.

Motivation: Once ElasticSearch goes away, there will be no documented way of searching data in the data store. Instead, users will need to use the eventing and journaling system (flashflood), so we need to have documentation ready for users to refer to.

chmreid commented 4 years ago

Updating this ticket based on information gleaned from the meeting with @natanlao @amarjandu and @xbrianh yesterday.

The flash-flood library was intended to provide a journal of events of data transactions in the data store, which was meant to feed downstream services like the query service and azul. It was not meant to provide search functionality. So, thinking of flash-flood as something that could provide a replacement to ElasticSearch was fallacious.

However, as per the comments in #2, ElasticSearch could be refactored as a downstream consumer of events from flash-flood, and use it to compile its search index.

Alternatively, flash-flood can return large amounts of information about transactions and events in JSON format, which can be quickly searched using JMESPath. This is another avenue to explore, although it would only be practical for a limited range of events, and would not serve as an ElasticSearch replacement.

chmreid commented 4 years ago

For this ticket specifically, though, I think @natanlao can focus on two things: