atom-archive / xray

An experimental next-generation Electron-based text editor
MIT License
8.48k stars 235 forks source link

Introduce epochs to allow work trees to switch to different base commits #157

Closed nathansobo closed 5 years ago

nathansobo commented 5 years ago

Previously, once a work tree was constructed with an initial commit, the only way to evolve its state was by applying operations. This was problematic because after a long period of collaboration, an unbounded number of operations could accumulate, which made it expensive to fetch and apply operations when joining the work tree. This PR addresses the situation by dividing the operation history into epochs, each of which builds forward from a commit in the underlying Git repository.

Conceptually, epochs are similar to commits, but are open to real time evolution as additional operations are applied. The diagram below demonstrates how they fit into the existing Git mental model:

epochs_and_commits

In the diagram, three separate epochs all start at commit A, each associated with a different branch. In branches foo and bar, epochs 0 and 3 are inactive. This means that new commits have been created on those branches. Going forward, we can associate commits with the epochs that preceded them to enable history replay. Epoch 5, depicted in green, is being actively edited, as are epochs 2 and 4, which are based off of different commits.

In this first PR related to epochs, we've laid down the basic infrastructure needed for a work tree to switch between epochs. Though this PR introduces epochs, the Epoch struct it introduces is mostly just renamed from the previous WorkTree implementation. The WorkTree is now a wrapper that forwards requests to the current epoch and handles switching between epochs.

Previously, Memo was only exposed a synchronous API and relied on the client to perform all I/O. This synchronous API continues to exist internally within the Epoch implementation, but the public API is now asynchronous in multiple places. The WorkTree now accepts a GitProvider trait object from which it can request the base entries for a given commit oid or the text of a file given a commit oid and a path. Because most providers will need to perform I/O to service these requests, these methods are expected to return streams and futures from the futures crate. Similarly, several methods on WorkTree now also return futures or streams when they may require interaction with the Git provider. In Memo JS, we can accept and return asynchronous iterators and promises and translate them to the relevant objects from the futures crate via adaptors. This allows the JS event loop to drive execution of our streams and futures, avoiding the need for an explicit executor in Memo JS. For the daemon, we can drive these futures with the Tokio event loop.

The most interesting async code introduced in this PR is the SwitchEpochs future, which is returned whenever we want to reset the work tree to a different HEAD. We can perform a reset following a commit or simply to express a hard or soft reset in the repository. When async/await support stabilizes it will be easier to implement this kind of async routine in higher-level code, but for now we implement the Future trait directly with a custom poll method. When polling the future, we first need to stream in base entries for the new epoch. Once this is done, for each open buffer in the current epoch, we need to open a corresponding buffer in the epoch to which we are switching. We rely on path correspondence to match up files between the two epochs, which is imperfect but no worse than the typical text editor does in the face of Git operations.

It's also worth noting that this PR changes the JS interface to the library quite a bit. We now expose a higher level API. As mentioned above, you instantiate the WorkTree with a GitProvider. You also receive Buffer objects when you call openTextFile, which have a getText and edit method as well as an onChange method to allow you to subscribe to changes. Finally, all returned operations are now wrapped in an OperationEnvelope that associates each base64-encoded operation with the replicaId and replicaTimestamp of its associated epoch. These need to be used as a key to group and sort stored operations in the database.

Going forward, there are still a few improvements we'd like to land.

Before merging:

/cc @as-cii @probablycorey @joshaber @jeffrafter @asheren @miskander