ava-labs / hypersdk

Opinionated Framework for Building Hyper-Scalable Blockchains on Avalanche
https://hypersdk.xyz/
Other
195 stars 100 forks source link

Support External Indexer via gRPC AcceptedSubscriber + Optional Internal Indexing #1225

Open aaronbuchwald opened 1 month ago

aaronbuchwald commented 1 month ago

The HyperSDK should enable VMs to index relevant data either in the node (for simple setup and quickly launching a network w/ everything they need) or using an external indexer that subscribes to the node in order to reliably process every block.

https://github.com/ava-labs/hypersdk/pull/1143 introduces a simple interface to subscribe to all blocks accepted by the HyperSDK and https://github.com/ava-labs/hypersdk/issues/1145 fixes a previous bug where the HyperSDK would not guarantee at least once delivery of accepted blocks.

The HyperSDK should make it as easy as possible for a basic data ingestion pipeline to process accepted blocks and spit out the relevant static data to index (and display in an explorer).

External Subscriber

We should implement a gRPC service that implements the AcceptedSubscriber logic. This would consist of a sidecar that runs a server and listens on a given port, so that a user can configure the HyperSDK with a gRPC client that will dial the server and push all accepted blocks to the server.

The sidecar would export a listening address, which could then be passed into the HyperSDK APIs via config:

{
    "exportedBlockSubscribers": "localhost:9001"
}

The external subscriber then needs to guarantee that:

1) It sends an acknowledgement back to the HyperSDK after it's processed each block (allow HyperSDK to clean up the block and continue processing the accepted queue) 2) Ensure block operation is idempotent to ensure it handles multiple deliveries correctly

Standard Indexer to optionally Serve APIs within HyperSDK

This proposes a small change from the original ethos of the HyperSDK to do the absolute minimum inside of the node. Instead, this proposes to support an optional set of APIs to get started as quickly as possible.

This should include at least block and transaction indexing:

This can then support the basic APIs - GetTransaction, GetBlockByHeight and GetBlock.

Transform Block to Static Data

Export a function to convert from a block to the relevant JSON for the block including all of the included transactions.

This can be as simple as:

var _ AcceptedSubscriber = (*pipeline)(nil)

type pipeline struct {
    blockProcessor BlockProcessor
    txProcessor txProcessor
}

func (p *pipeline) Accepted(ctx context.Context, blk *chain.StatelessBlock) error {
    blockJSON, _ := json.Marshal(blk)
    txs := blockJSON["txs"]

    if err := p.blockProcessor.ProcessBlock(ctx, blk); err != nil {
        return err
    }

    return p.txProcessor.ProcessTxs(ctx, txs)
}

For an external indexer, this would then be wrapped with the gRPC AcceptedSubscriber server and send an ACK back to the HyperSDK once it's successfully indexed the block and transactions.

aaronbuchwald commented 1 month ago

Linking a few relevant issues here:

Export Block/Tx/State Diffs to External Store

https://github.com/ava-labs/hypersdk/issues/961

I think the best way to support block/tx indexing is with this accepted subscriber pattern. Exporting state diffs would be a change from the current interface that we could optionally support if needed. Depending on the VM and use case, this may push a lot of data, so I'd prefer to export state diffs to an external store if the need arises rather than prioritizing it and changing the code to support it now.

If this is completed w/o exporting state diffs to an external store, we should open a new GitHub issue for state diffs as a potential future improvement.

Support S3 Archiver

https://github.com/ava-labs/hypersdk/issues/531 https://github.com/ava-labs/hypersdk/pull/697

This would be great to support. To avoid feature bloat in the HyperSDK, I'd prefer an S3 archiver implementation to be implemented as an external service to the HyperSDK using the gRPC Accepted Subscriber.

gbartolome-avax commented 1 month ago

Per Slack conversation with @aaronbuchwald Data Platform would archive HyperSDK payloads using a similar pattern we use now when we ingest our EVM based subnets into our Data Lake

Chain Ingestion - On-Chain Producer

It will depend on a subscriber based producer/consumer push pattern.

  1. parent consumer will subscribe to HyperSDK messages and push messages to a messaging stream, in this case Kafka
  2. child consumers will subscribe to HyperSDK Kafka topic for all payloads