Open purplefox opened 2 years ago
Update: better method.
We introduce versioning on the data key, then we can create a true snapshot of the data. Then, when creating an MV we insert a new executor type "FillExecutor" in front of the MV. The Fill Executor maintains an iterator on the feeding source. The iterator at any one time has a version associated to it, and only iterates over keys which have the highest version <= to that version. Once it has iterated over all keys for a particular version then it moves to the next version and repeats. This is repeated until there are no more rows. At this point the fill is complete and the FillExecutor switches over to live records. The Fill Executor will store its offset persistently in each batch that is processed so that on failure it will carry on where it left off.
The current MV/index fill occurs as part of DDL execution. It can take considerable time, and if raft leadership changes during this period it will be cancelled. The fill logic is also very complex.
Here is how we can improve it: