aai-institute / lakefs-spec

An fsspec implementation for the lakeFS project.
http://lakefs-spec.org/
Apache License 2.0
37 stars 4 forks source link

Introduce new transaction model based on ephemeral branches #258

Closed nicholasjng closed 5 months ago

nicholasjng commented 5 months ago

This is a breaking change, as it requires users to give arguments to the return object of LakeFSFileSystem.transaction.

Changes the LakeFSTransaction class to become a repository+branch-scoped ephemeral branch.

File uploads, removals, and commits happen on that ephemeral branch. After transaction exit, the created branch is optionally merged back into the base branch.

The base branch (from which we branch off in the transaction) must exist before start of the transaction.

Some transaction APIs (merge, revert, tag) can also work on branches other than the transaction's ephemeral branch - this behavior might not be well-defined, and is up for debate on removal.

Similarly, on puts and deletes, one currently has to specify the transaction branch directly via the LakeFSTransaction.branch property. This can be made nicer by allowing resource paths relative to the transaction's repo and branch in special functions like e.g. LakeFSTransaction.put() (deferring to fs.put()), LakeFSTransaction.rm() (deferring to fs.rm()), and so on.

This pull request turns the LakeFSTransaction into a callable, which needs to be called with a repository name at minimum to ensure transaction execution.

codecov[bot] commented 5 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (cc7d289) 94.08% compared to head (f6b0cca) 94.24%.

Files Patch % Lines
src/lakefs_spec/transaction.py 98.63% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #258 +/- ## ========================================== + Coverage 94.08% 94.24% +0.15% ========================================== Files 5 5 Lines 406 382 -24 Branches 72 70 -2 ========================================== - Hits 382 360 -22 + Misses 15 13 -2 Partials 9 9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nicholasjng commented 5 months ago

Stateful transaction support requested in https://github.com/fsspec/filesystem_spec/issues/1517.

AdrianoKF commented 5 months ago

Another idea is to give default arguments to the file system constructor, but these are stickier / more opaque than initializer arguments to the transaction itself. Passing arguments to the transaction has to be solved on the fsspec level, however, if there is interest at all by the maintainers.

In case there is no upstream solution, there still is the third option, which I explored in my prototype: turning LakeFSTransaction into a callable object, which allows passing (optional) configuration when using it as a context manager, but still being compatible with the way that fsspec instantiates the transaction class.