Open Byron opened 3 years ago
What's the status on gixp clone
? I'm very much interested in helping out on that front.
gixp clone
as it's seen here would only clone bare repositories. The biggest requirement for achieving work tree checkouts is to implement git-index
. Doing so requires a serious investment in time and great attention to detail. There may be smaller tasks on the way but ultimately, git-index
is what's needed to clone a repository with work tree.
If this is outlook isn't too frightening for you, I'd be happy to get you involved in some capacity.
I have never contributed to gitoxide so I'm not too familiar with it yet, but I learn things quickly - nothing frightens me :) so yes, I'm more than happy to try things out if you give me some pointers in the right direction.
Have you had a chance to check out the backlog here? https://github.com/Byron/gitoxide/projects/1
A good way to get acquainted with gitoxide
would probably be to use it by further oxidizing some crates that are using git2
ATM but could already use gitoxide
. This would inevitably lead to some features being implemented or improved on on the way.
Speaking of feature, I think desperately needed is commit ancestor traversal sorted by commit time.
A way forward would be for you to find something you are comfortable to get started, then we could kick it off in a 1:1 even.
Just let me know.
PS: I connected to you on keybase, a way to reach out to me in a more realtime and private fashion, as needed.
@Nytelife26 @Byron Had the chance to get progress on this one ? :)
All building blocks for a bare clone exist, they haven't been put into a cohesive package though.
A non-bare clone is in the works which will include the bare one by its very nature.
Do what's needed to fetch as good as git does (on a bare repository, one without a working tree). This particularly includes proper ref handling as well as safety in the light of concurrent repository access.
Tasks
gixp pack-receive
intogixp clone
creating an empty repository (for lack of index handling/checkout) and cloning the first pack.git-protocol
.git-repository
to greatly simplifying doing ref-listings and fetches?Archive
### Research ### Research #### Reflog Handling * entirely disabled in bare repos * forward iterators could be bstr::lines() * reverse-iterators could be bstr::SplitReverse with a VecDeque for refilling a read buffer from the end of a file with seeks. * line parsing is [here](https://github.com/git/git/blob/master/refs/files-backend.c#L1892:L1919) * expiry is done by rewriting the entire file based on a filter, writing is literally [here](https://github.com/git/git/blob/master/refs/files-backend.c#L3028:L3028) #### Refs Writing * You can turn a symbolic ref into a peeled one (i.e. detach a HEAD) with transactions but you cannot turn it back into a symbolic one with that. All that happens directly and outside of transactions. * Writing symbolic references like HEAD [splits the ref update](https://github.com/git/git/blob/seen/refs/files-backend.c#L2263:L2263) transparently and across any amount of refs. * You cannot [delete ref logs](https://github.com/git/git/blob/seen/refs/files-backend.c#L2817:L2817) using `REF_LOG_ONLY` but they are deleted with the owning reference. * [ref transactions](https://github.com/git/git/blob/master/refs/refs-internal.h#L197:L197) * there is a transaction hook which gets all transaction data without flags, that is old and new oid and refname, along with the 'action' indicating what happened to the transaction. * probably it should be possible to introspect transactions as they are executing, but theoretically this can also happen outside of the method itself. * [git file lock](https://github.com/git/git/blob/master/lockfile.c#L73:L86) * it looks like they are creating a tempfile with a specified name for locks (exclusive and all using atomic FS ops) which can then potentially be written in the same moment. Definitely good for loose refs that don't exist. * loose refs writing intricately knows [packed refs](https://github.com/git/git/blob/master/refs/files-backend.c#L711:L711), which makes sense in order to keep them consistent. #### File Locking * investigate [tempfile](https://docs.rs/tempfile/3.2.0) to conclude that it's certainly great as reference but won't be exactly what git does. Let's see if it's needed after all to do it exactly like that. Git definitely sets up signal handlers to delete tempfiles so probably these will have to be threadsafe or interned objects. * If directories are involved, use [raceproof file creation](https://github.com/git/git/blob/master/object-file.c#L417:L417) * [lockfile.c](https://github.com/git/git/blob/master/lockfile.c#L1:L1) holds the entire blocking implementation, including backoff. Looks like that's `git-lock`. #### Reflogs * The file is read line by line and entries are handled on the fly using iterators, easiest to use bstr::lines() there. * reverse iterators use a buffer of 1024 bytes to seek lines backwards * parsing is [here](https://github.com/git/git/blob/master/refs/files-backend.c#L1892:L1919) * for expiry the file is rewritten based on iteration * for new reflogs, these are appended (only) #### Refs Writing * [git file lock](https://github.com/git/git/blob/master/lockfile.c#L73:L86) * `cargo` uses [flock](https://github.com/rust-lang/cargo/blob/master/src/cargo/util/flock.rs#L384:L392) for comparison with different semantics. * [fslock](https://docs.rs/fslock/0.1.6/fslock/) seems a bit newer and has a few tests * [fs2](https://github.com/danburkert/fs2-rs) does not compile anymore and seems unmaintained for years now. Can do more than we need, too. * [file-lock](https://crates.io/crates/file-lock) is posix only but uses fcntl under the hood. #### Signal-Hook * The use of mutexes is unsafe as the current thread might be interrupted while holding the mutex. When trying to obtain a lock in the handler the thread will inevitably deadlock. * Memory allocation and deallocation is not allowed! So inside a handler we have to do what we do and call `std::mem::forget` to implement it correctly. ### Done Tasks * **prodash** * replace usage of ctrlc that starts yet another thread with the signal-hook iterator to process pending events from time to time as part fo the ticker thread. Saves a thread and enables proper handler chaining. * **git-features** * Replace `ctrlc` usage with signal-hook (i.e. current atexit handler for interrupts) * don't use stdout in interrupt handler as it does use a mutex under the hood. Instead allow aborting after the second interrupt in case the application is not responding. It would be great to have a lock-free version of stderr though… . * Integrate 'git-tempfile' behind feature toggle to allow interrupt handlers to be tempfile handler aware and not interfere. * replace existing usage of git_features::interrupt::is_interrupted() with versions of it that are local to the method or function. * move `git-features::interrupt` into `git-repository` as this kind of utility is for application usage only. There the `git-tempfile` integration makes sense, too. * **git-tempfile** * registered [tempfile] support to allow deletion on exit (and other signals). Use dashmap as storage. * Make sure pid is recorded to assure [forking works as expected][tempfile-fork]. * docs * fix windows build * a test validating default handlers are installed * release * race-proof creation of directories leading to the tempfile * a way to use the above for actual tempfiles * race-proof deletion of empty directories that conflict with the filename * a way to use the above for actual tempfiles * differentiate between closed and writable tempfiles in the typesystem to make choice permanent * a way to not install any handlers so that git-repository interrupt can run the tempfile removal itself right before aborting. * Make `with_mut` less cumbersome to use by assuming the interrupt handler will indeed abort. * **git-lock** - a crate providing [git-style] lock files. * lock file for update * marker for holding a lock * exponential backoff * the above with randomization * actual retries with blocking sleep * test for the above * **git-refs** * sketch transaction type * figure out whether or not to 'extend' the API to include changes from Symbolic refs to peeled ones in transactions * git signature parsing code is shared and moved to git-actor * git-object uses git-actor * git-object: unify nom error handling everywhere (to reuse the nom error handling machinery instead of re-inventing it) * git-object can use verbose errors and `()` - unit errors per feature toggle. * parse ref log line * reflog forward iteration * reflog backward iteration * file reflog writing * git-tempfile close (Handler