Byron / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.84k stars 301 forks source link

octopus-merge (part 2: blob-merge) #1585

Open Byron opened 3 weeks ago

Byron commented 3 weeks ago

Implement an octopus merge based on trees, and (mostly) equivalent to merge-ORT in Git. The foundation, tree-editing, was implemented in #1566.

Related to https://github.com/gitbutlerapp/gitbutler/pull/4793.

Tasks

This PR was re-focussed on blob-based conflict detection and the generation of blob merge-results with conflict markers.

Questions

Next PR / Outscoped

Research

Everything is about MergeORT.

Crates

Mostly for performance optimization, also interesting for handling the index, and the tree::editor types.

Building Blocks for String Interning

String Interining

Handle Special Cases

Questions

Is git2::merge_trees() a trivial merge? Does it handle all the cases of MergeORT?

Maybe not, but it definitely handles rename tracking. See these git2 flags for more information.

How does rename-tracking affect a tree-merge?

The first round is done without renames, then there is a second round to find renames, and perform the merge of renamed items.

How is an octopus merge implemented, particularly with Merge ORT?

References

### Removed Utility Prints a patch with context 3, using bytes only, based on imara-diff. ```Rust /// An adapted version of the unified diff writer to get a first idea. // TODO: remove this #[allow(dead_code)] mod unified_test { use bstr::ByteVec; use imara_diff::intern::{InternedInput, Interner, Token}; use imara_diff::Sink; use std::ops::Range; /// A [`Sink`] that creates a textual diff /// in the format typically output by git or gnu-diff if the `-u` option is used pub struct UnifiedDiffBuilder<'a, W> where W: std::io::Write, { before: &'a [Token], after: &'a [Token], interner: &'a Interner<&'a [u8]>, pos: u32, before_hunk_start: u32, after_hunk_start: u32, before_hunk_len: u32, after_hunk_len: u32, buffer: Vec, dst: W, } impl<'a, W> UnifiedDiffBuilder<'a, W> where W: std::io::Write, { /// Create a new `UnifiedDiffBuilder` for the given `input`, /// that will writes it output to the provided implementation of [`Write`]. pub fn with_writer(input: &'a InternedInput<&'a [u8]>, writer: W) -> Self { Self { before_hunk_start: 0, after_hunk_start: 0, before_hunk_len: 0, after_hunk_len: 0, buffer: Vec::with_capacity(8), dst: writer, interner: &input.interner, before: &input.before, after: &input.after, pos: 0, } } fn print_tokens(&mut self, tokens: &[Token], prefix: char) { for &token in tokens { self.buffer.push_char(prefix); self.buffer.extend_from_slice(self.interner[token]); } } fn flush(&mut self) { if self.before_hunk_len == 0 && self.after_hunk_len == 0 { return; } let end = (self.pos + 3).min(self.before.len() as u32); self.update_pos(end, end); writeln!( &mut self.dst, "@@ -{},{} +{},{} @@", self.before_hunk_start + 1, self.before_hunk_len, self.after_hunk_start + 1, self.after_hunk_len, ) .unwrap(); self.dst.write_all(&self.buffer).unwrap(); self.buffer.clear(); self.before_hunk_len = 0; self.after_hunk_len = 0 } fn update_pos(&mut self, print_to: u32, move_to: u32) { self.print_tokens(&self.before[self.pos as usize..print_to as usize], ' '); let len = print_to - self.pos; self.pos = move_to; self.before_hunk_len += len; self.after_hunk_len += len; } } impl Sink for UnifiedDiffBuilder<'_, W> where W: std::io::Write, { type Out = W; fn process_change(&mut self, before: Range, after: Range) { if before.start - self.pos > 6 { self.flush(); self.pos = before.start - 3; self.before_hunk_start = self.pos; self.after_hunk_start = after.start - 3; } self.update_pos(before.start, before.end); self.before_hunk_len += before.end - before.start; self.after_hunk_len += after.end - after.start; self.print_tokens(&self.before[before.start as usize..before.end as usize], '-'); self.print_tokens(&self.after[after.start as usize..after.end as usize], '+'); } fn finish(mut self) -> Self::Out { self.flush(); self.dst } } } } ```
Byron commented 2 weeks ago

@EliahKagan Just as a note in case you'd like to submit a first patch to Git that will definitely be able to land, maybe to warm-up with the mailing-list workflow, then I have something for you.

The attribute-documentation about three-way merges contains a reference to merge.default which is indeed used in code as well.

However, the git-config documentation doesn't mention it at all.