jchris commented 1 year ago

Technical Plan: Bulk Indexing Refactor

The plan involves two significant changes - moving to a clock-first architecture and enabling vector and full text indexing capabilities.

Clock-first Architecture

We are going to switch our approach to a clock-first architecture. This new architecture will be designed as a special case of a probabilistic (prolly) view with a unique ID constraint on the clock. This will act as the foundation for the updated system.

Indexing Capabilities

We will enhance the system's indexing capabilities to allow vector and full text indexing to operate on the same field as map indexing. This enhancement will significantly improve the flexibility and search capabilities within the system.

Workflow in Trees

The system will work with trees in the following manner:

Event Storage: At present, we store raw JSON events in the clock. We replay these events in the order determined by the clock. This replay occurs over the state referenced by the lowest common ancestor when merges happen. Otherwise, the system serves the materialized prolly tree for key lookups.
Prolly Diffs: In the new system, we'll store prolly diffs (differences) in the clock, inclusive of both read and write blocks. This change is in line with the shift to a clock-first architecture. Storing prolly diffs instead of raw JSON events will improve efficiency and enable the broader indexing capabilities described above.

This plan represents a significant refactor of the current system but will result in a more flexible and efficient architecture that enables advanced indexing capabilities.

jchris commented 1 year ago

We also need to decide on the replication semantics for the validation function. Does it run as a special user or get skipped, etc.

jchris commented 1 year ago

Each clock head can correspond to a car transaction.

fireproof-storage / fireproof-alpha

Bulk operations #17

Technical Plan: Bulk Indexing Refactor

Clock-first Architecture

Indexing Capabilities

Workflow in Trees