Closed bonustrack closed 5 months ago
First, I'm going to agree that Checkpoint needs some ORM, so we can abstract some logic and optimisations that will potentially be useful to its users (for example, it will be helpful to implement this feature as well).
Now, concerning tracking mutations and being able to query the state at a block. I think for this to work correctly and be helpful; we will need to track more than inserts; we will also need to track updates and deletions of entities. Thinking more about the database structure and the logic to fetch entities at a specific block height will help determine this.
Here is are my thoughts on how we can implement this:
We will need to keep track of each entity's state after every mutation. This plan is taking some inspiration from how The Graph currently does. We can achieve this by tracking the block range for which an entity's state is valid and only creating a new entity record with an updated block range when there is a change to the entity.
I'll be using the Vote
entity in your example to continue my description and assume it has a schema like:
type Vote {
id: String!
proposal: Int!
choice: Int!
}
Currently, we generate an SQL table of similar structure to the entities graphql field, BUT if we want to track entity states at specific block heights, we should create a table for each entity like this:
Column | Type |
---|---|
id | VARCHAR NOT NULL |
proposal | INT NOT NULL |
choice | INT NOT NULL |
_block_min | INT NOT NULL |
_block_max | INT NULLABLE |
The _block_min
represents the earliest block for which this entity record is valid (inclusive), and the _block_max
represents the last block for which this record is no more valid (exclusive) i.e. the entity record is valid from block _block_min
to _block_max - 1
.
_block_max
can be null to signify that this particular entity record is valid for all block from _block_min
to the latest block.
This change to the table structure also means we will potentially insert multiple entity records with the same id
value. The easiest fix will be to have a compound primary key that uses the id
, _block_min
and _block_max
values. This compound primary key will ensure that we can insert multiple records with the same id with varying block ranges.
The primary issue with this approach is that it will be possible for a data writer to insert different entities with the same id
at other blocks. Still, if we introduce an ORM and encourage entity creations through the ORM, we can have the ORM's create function perform a check to ensure that no entity with a null _block_max
value exists with the same id before creating the entity.
We will need an ORM that will expose CRUD functionality to its users. The ORM will be used in both graphql resolvers and data writers.
For simplicity, we will only be creating functions to create or modify a single record at a time. Future iterations will plan to support multiple entity updates.
Creating an entity will Insert an entity record with the current block number set to _block_min
and _block_max
set to null. For example:
INSERT INTO `votes`(id, proposal, choice, _block_min, _block_max)
VALUES ($id, $proposal, $choice, $current_block, null)
This logic should also confirm that there isn't an existing record with the same id with a null _block_max
.
Updating an entity means some data has changed. First, an existing record matching the specified id
needs to have its _block_max
set to the current_block
and then a new entity record will be created for the current_block. For example:
UPDATE `votes` SET _block_max = ($current_block)
WHERE id = $id AND _block_max IS NULL
INSERT INTO `votes`(id, proposal, choice, _block_min, _block_max)
VALUES ($id, $proposal, $choice, $current_block, null)
Deleting an entity means the data doesn't exist anymore at the block range. To reflect this we only need to set the _block_max
value to current_block
and that is it:
UPDATE `votes` SET _block_max = ($current_block)
WHERE id = $id AND _block_max IS NULL
Querying entities will involve including a where clause based on the range of _block_min
and _block_max
.
Assuming we have a $target_block we would like to fetch an entity. Then an example query will look like:
SELECT ... from `votes`
WHERE ... AND (
(_block_min <= $target_block AND _block_max > $target_block) OR
(_block_min <= $target_block AND _block_max IS NULL)
)
interface EntityStore {
// Creates a new entity
createEntity: (name: String, data: EntityRecord) => Promise<void>,
// Update an existing entity based on id
updateEntity: (name: String, id: SupportedFieldTypes, updatedData: Record<string, SupportedFieldTypes>) => Promise<void>,
// delete an entity by id
deleteEntity: (name: string, id: SupportedFieldTypes) => Promise<void>,
// Fetch an entity by its id
readEntity: (name: string, id: SupportedFieldTypes) => Promise<EntityRecord | null>,
// Fetch multiple entities based on the whereClause.
// The params are for substitution to avoid sql injection.
readEntities: (name: string, whereClause: string, whereParams: Record<string, SupportedFieldTypes>) => Promise<EntityRecord[]>
}
type SupportedFieldTypes = string | number;
type EntityRecord = {
id: SupportedFieldTypes;
[fields: string]?: SupportedFieldTypes;
};
An implementation of this EntityStore will be injected as arguments to Checkpoint writers and also as part of context for its resolvers.
Finally, we will need to update the generated Query for entities. It should include querying the block number like this:
type Query {
vote(id: String!, block: Int): Vote;
votes(
...,
where: WhereVote,
block: In
}
type WhereVote {
...
}
Currently when you query Checkpoint API you get only data sync from the latest state of the chain. Some apps like SX-UI will need to query the API with state at a specific block number in the past for resolving things like delegation for vote. One way we could enable this lookback feature is to store all mutations with their corresponding block number on the db. If we want to query data at latest state of the chain we would just query with MySQL only the entries with the highest block number; And if we query with a state at a specific block number we would query with MySQL just the entries with a block number before or equal to the defined state block number.
One issue with this approach is that we can't enforce how data is stored on Checkpoint so it would be hard for dev to understand how their MySQL INSERT queries should look like, we would probably need to have some kind of in-house ORM to handle insert queries, it could be simple like: