checkpoint-labs / checkpoint

Checkpoint is a library for indexing data of Starknet contracts.
https://checkpoint.box
MIT License
55 stars 23 forks source link

feat: lookback query at specific block number #61

Closed bonustrack closed 5 months ago

bonustrack commented 2 years ago

Currently when you query Checkpoint API you get only data sync from the latest state of the chain. Some apps like SX-UI will need to query the API with state at a specific block number in the past for resolving things like delegation for vote. One way we could enable this lookback feature is to store all mutations with their corresponding block number on the db. If we want to query data at latest state of the chain we would just query with MySQL only the entries with the highest block number; And if we query with a state at a specific block number we would query with MySQL just the entries with a block number before or equal to the defined state block number.

One issue with this approach is that we can't enforce how data is stored on Checkpoint so it would be hard for dev to understand how their MySQL INSERT queries should look like, we would probably need to have some kind of in-house ORM to handle insert queries, it could be simple like:

const vote = { proposal: 1, choice: 2 };
await db.insert('Vote', vote);
perfectmak commented 2 years ago

First, I'm going to agree that Checkpoint needs some ORM, so we can abstract some logic and optimisations that will potentially be useful to its users (for example, it will be helpful to implement this feature as well).

Now, concerning tracking mutations and being able to query the state at a block. I think for this to work correctly and be helpful; we will need to track more than inserts; we will also need to track updates and deletions of entities. Thinking more about the database structure and the logic to fetch entities at a specific block height will help determine this.

Here is are my thoughts on how we can implement this:

Tracking Entity State Plan

We will need to keep track of each entity's state after every mutation. This plan is taking some inspiration from how The Graph currently does. We can achieve this by tracking the block range for which an entity's state is valid and only creating a new entity record with an updated block range when there is a change to the entity.

I'll be using the Vote entity in your example to continue my description and assume it has a schema like:

type Vote {
  id: String!
  proposal: Int!
  choice: Int!
}

Currently, we generate an SQL table of similar structure to the entities graphql field, BUT if we want to track entity states at specific block heights, we should create a table for each entity like this:

Column Type
id VARCHAR NOT NULL
proposal INT NOT NULL
choice INT NOT NULL
_block_min INT NOT NULL
_block_max INT NULLABLE

The _block_min represents the earliest block for which this entity record is valid (inclusive), and the _block_max represents the last block for which this record is no more valid (exclusive) i.e. the entity record is valid from block _block_min to _block_max - 1.

_block_max can be null to signify that this particular entity record is valid for all block from _block_min to the latest block.

Changes to the primary key

This change to the table structure also means we will potentially insert multiple entity records with the same id value. The easiest fix will be to have a compound primary key that uses the id, _block_min and _block_max values. This compound primary key will ensure that we can insert multiple records with the same id with varying block ranges.

The primary issue with this approach is that it will be possible for a data writer to insert different entities with the same id at other blocks. Still, if we introduce an ORM and encourage entity creations through the ORM, we can have the ORM's create function perform a check to ensure that no entity with a null _block_max value exists with the same id before creating the entity.

ORM Design and Logic

We will need an ORM that will expose CRUD functionality to its users. The ORM will be used in both graphql resolvers and data writers.

For simplicity, we will only be creating functions to create or modify a single record at a time. Future iterations will plan to support multiple entity updates.

1. Creating an Entity

Creating an entity will Insert an entity record with the current block number set to _block_min and _block_max set to null. For example:

INSERT INTO `votes`(id, proposal, choice, _block_min, _block_max)
VALUES ($id, $proposal, $choice, $current_block, null)

This logic should also confirm that there isn't an existing record with the same id with a null _block_max .

2. Updating an Entity

Updating an entity means some data has changed. First, an existing record matching the specified id needs to have its _block_max set to the current_block and then a new entity record will be created for the current_block. For example:

UPDATE `votes` SET _block_max = ($current_block)
WHERE id = $id AND _block_max IS NULL

INSERT INTO `votes`(id, proposal, choice, _block_min, _block_max)
VALUES ($id, $proposal, $choice, $current_block, null)

3. Deleting an Entity

Deleting an entity means the data doesn't exist anymore at the block range. To reflect this we only need to set the _block_max value to current_block and that is it:

UPDATE `votes` SET _block_max = ($current_block)
WHERE id = $id AND _block_max IS NULL

4. Reading Entities

Querying entities will involve including a where clause based on the range of _block_min and _block_max.

Assuming we have a $target_block we would like to fetch an entity. Then an example query will look like:

SELECT ... from `votes`
WHERE ... AND (
    (_block_min <= $target_block AND _block_max > $target_block) OR 
    (_block_min <= $target_block AND _block_max IS NULL)
)

Proposed interface for ORM

interface EntityStore {
    // Creates a new entity 
    createEntity: (name: String, data: EntityRecord) => Promise<void>,

    // Update an existing entity based on id
    updateEntity: (name: String, id: SupportedFieldTypes, updatedData: Record<string, SupportedFieldTypes>) => Promise<void>,

    // delete an entity by id
    deleteEntity: (name: string, id: SupportedFieldTypes) => Promise<void>,

    // Fetch an entity by its id
    readEntity: (name: string, id: SupportedFieldTypes) => Promise<EntityRecord | null>,

    // Fetch multiple entities based on the whereClause. 
    // The params are for substitution to avoid sql injection.
    readEntities: (name: string, whereClause: string, whereParams: Record<string, SupportedFieldTypes>) => Promise<EntityRecord[]>
}

type SupportedFieldTypes = string | number;
type EntityRecord = {
    id: SupportedFieldTypes;
    [fields: string]?: SupportedFieldTypes;
};

An implementation of this EntityStore will be injected as arguments to Checkpoint writers and also as part of context for its resolvers.

Updating Generated GraphQL Queries

Finally, we will need to update the generated Query for entities. It should include querying the block number like this:

type Query {
    vote(id: String!, block: Int): Vote;
    votes(
        ...,
        where: WhereVote,
        block: In
}

type WhereVote {
    ...
}