cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.27k stars 3.63k forks source link

CosmosSDK State Watching Features - KVStore reading and listening capabilities - Subscription/stream service(s) #7889

Closed i-norden closed 3 years ago

i-norden commented 4 years ago

Summary

State listening external streaming/pub-sub service

Problem Definition

Currently, KVStore data can be remotely accessed through Queries which proceed through Tendermint and the ABCI. In addition to these request/response queries, it would be beneficial to have a means of listening to state changes as they occur by some pub-sub or streaming mechanism.

The changes proposed here are the 2nd step towards achieving that, by exposing the state listeners introduced in #7888 to external consumers.

What problems may be addressed by introducing this feature?

Realtime data availability

What benefits does the SDK stand to gain by including this feature?

Tools for state listening should be beneficial for many cosmos applications, as a fundamental means of improving data availability/accessibility

Are there any disadvantages of including this feature?

If an application developer opts to use these features to expose data, they need to be aware of the ramifications/risks of that data exposure as it pertains to the specifics of their application

Proposal

Question: How should we externally expose KVStore state changes?

In #7888 the idea is that the BaseApp can use io.Writers to listen to specific KVStores. The BaseApp can then pass the io handlers into a streaming/server interface that can route and read out the state changes.

It may be better to use a more concrete type than io.Writer in #7888 since the type will need to support streaming out the data e.g.

type ExampleHandler struct {
    Stream chan []byte // stream the writer output from here
}

func (eh *ExampleHandler) Write(p []byte) (n int, err error) {
    select {
    case eh.Stream <- p:
        return len(p), nil
    default:
        return 0, errors.New("unable to write to channel")
    }
}

I've started to outline some of the potential mechanisms for streaming data out below, hoping to get some feedback on what the first approach should be.

Write to file

Pros:

Cons:

Simple Pub-Sub (gRPC Streaming, WebSockets)

Pros:

Cons:

Pub-Sub using embedded persistent queue

e.g. https://github.com/joncrlsn/dque

Pros:

Cons:

Pub-Sub using persistent Redis queue

e.g. https://github.com/adjust/redismq Pros:

Cons:

GraphQL ontop of Postgres queue (postgraphile) using NOTIFY triggers

Pros:

Cons:

External message service

e.g. Apache Kafka, RabbitMQ, KubeMQ Pros:

Cons:


For Admin Use

alexanderbez commented 4 years ago

Thanks for writing this up @i-norden.

I'm a bit hesitant at the moment to introduce these mechanisms directly into the SDK. Rather, I would prefer to see an interface that potentially each streaming/writing mechanism can implement. The implementations of this interface (e.g. RabbitMQ, ZMQ, Redis, file, etc...) can exist outside of the SDK and we can provide out-of-the-box implementations for users. In addition, the SDK can include a common and simple "base" implementation (e.g. file).

i-norden commented 4 years ago

That makes sense! I'm going to open a new issue with that interface, or should I include that as part of the ADR from #7888?

alexanderbez commented 4 years ago

I would write this all up in a single ADR. I don't think any of these changes are contentious 👍

amaury1093 commented 3 years ago

Since the SDK chose to use gRPC as the de facto RPC layer, an alternative to what @alexanderbez proposes is to hardcode the io.Writer from #7888 as a gRPC server stream. If users want to use Redis/file/RabbitMQ, then they listen on the gRPC service that will be baked into the SDK.

Pros:

Cons:

alexanderbez commented 3 years ago

I like the idea! We'd need to be careful and examine any and all performance implications of having gRPC sit as a proxy. @i-norden do you think the consumer (e.g. RabbitMQ, ZMQ, file, etc...) cares about throughput or performance in general?

i-norden commented 3 years ago

Coming from my experience with eth where it is a struggle to keep pace with the head of the chain while listening to (and indexing) state changes there is some knee-jerk concern about performance but I don't think that is relevant here, in large part because of features such as these which will make listening to state changes a breeze in comparison :)

My gut reaction is that those pros outweigh the overhead of the grpc middleman, but I admittedly don't have a great feel for the performance demands and constraints for cosmos applications yet e.g. how many state updates tend to occur per-block.

If we remain with the io.Writer interface and provide a concrete gRPC server stream implementation as the standard for backing it, people could still implement a more performant/direct io.Writer if need be.

aaronc commented 3 years ago

I think gRPC streaming is useful for general purpose streaming. I do not think it is suitable for caching to a database because it is not fault tolerant. I would prefer to support both paths.

i-norden commented 3 years ago

@aaronc apologies, I keep overlooking the persistence needs! I will outline both (file and grpc) approaches in the proposal. We can also include a teeing io.Writer type to load any number of destination io.Writers into to allow us to simultaneously write out to both file and grpc stream.

i-norden commented 3 years ago

Sorry don't need to implement anything for that, will just use io.MultiWriter()

i-norden commented 3 years ago

I think this issue/proposal can be closed as it has been formalized as part of the ADR-038 specification: https://github.com/cosmos/cosmos-sdk/pull/8012