Open scratchee opened 2 years ago
If you want a "business case" for this fix, then concider that currently if someone's repo is too big, it makes fork painful to use as a gui, and there's no work-around other than hacking up the repository or switching to a different gui.
But with this fix, a much simpler solution becomes viable, which could become the recommended solution and allow for more customers. Think of all those big companies with monolithic repos!
Obviously, I am biased in favour of that logic, so take it with a pinch of salt :).
Thank you for reporting this problem and for the deep investigation.
I'm very interested in making Fork work well with sparse index.
Sadly, it doesn't play very well with fork at the moment, which appears to be primarily because of usage of git ls-files within fork.
Well, Fork calls git ls-files --unmerged
, but I doubt it can be slow with the --unmerged
flag.
We don't really need that request to be honest 🙂. I've been planning to remove it, but never had a time. IIRC, I only left it because it barely takes 30ms (even in 30Gb Chromium repo).
Or do you mean, it is slow in your case because it needs to rebuild the full index (ensure_full_index
)?
Can you show the result of time git ls-files --unmerged > /dev/null
?
And just for my curiosity: time git ls-files --sparse --unmerged > /dev/null
Hey, thank you, great to hear you're interested!
So with sparse index enabled:
$ time git ls-files --unmerged > /dev/null
git ls-files --unmerged > /dev/null 3.06s user 0.10s system 98% cpu 3.201 total
$ time git ls-files --unmerged --sparse > /dev/null
git ls-files --unmerged --sparse > /dev/null 0.02s user 0.01s system 86% cpu 0.029 total
with sparse index disabled (no effect at all, as I'd expect):
$ time git ls-files --unmerged > /dev/null
git ls-files --unmerged > /dev/null 0.14s user 0.07s system 94% cpu 0.224 total
$ time git ls-files --unmerged --sparse > /dev/null
git ls-files --unmerged --sparse > /dev/null 0.14s user 0.07s system 95% cpu 0.216 total
So it looks like ls-files only suffers because of ensure_full_index, and is otherwise fast even on my large git repo.
Sorry, looks like I unintentionally lied earlier,
ls-files was always slow for me, but it wasn't quite so bad before sparse-index (used to take about 2s iirc), because the index was already expanded, so there was no need spend time expanding it in memory in every call.
I should have said:
the "Refresh" option in fork was always slow for me, but it wasn't quite so bad before sparse-index (used to take about 2s iirc)
I conflated things without checking, sorry for confusing the issue.
I did just check to make sure, and with sparse index turned on/off to compare (still using a sparse checkout, a full 200GB checkout would definitely not be fun!).
With a sparse index: Refresh takes about 7s, with at least 3.5s spent in ls-files, and at least 190ms in status With a full index: Refresh takes about 2s, with at least 170ms spent in ls-files, and at least 800ms in status
(time spent in ls-files/status is "at least" since im just judging from an instruments time profile, so it only counts cpu-time)
So all in all, I'd expect using ls-files --sparse
should boost my refresh times to sub-1s 🤞
For reference, I'm on an 80GB repo, 200GB (2 million files) full working copies, 3GB (75000 file) sparse working copies, with 160MB index files (5MB sparse index files).
I'll make a build so you could try that.
Here's the build: https://cdn.fork.dev/prerelease/Fork-2.15.11.dmg
I just added --sparse
to ls-files
and haven't tried it.
That works, refresh takes ~1s and the ls-files call barely shows up at all in instruments during the refresh, and after I caused a conflict, fork seems to behave the same showing unmerged files.
So this works great as far as I can tell, thank you!
You can keep using that build for now. Hopefully I will remote ls-files
completely in the next update.
Could you tell later if using the sparse index improves the overall experience for you? If it really does, we can make the use in Fork more friendly.
In particular I'm interested in seeing real numbers for time git status --porcelain --untracked-files=all > /dev/null
for both enabled and disabled sparse index.
Thanks, will do.
Yes, I'm planning to stick with sparse index for a while and see how it goes.
sparse index:
$ time git status --porcelain --untracked-files=all > /dev/null
git status --porcelain --untracked-files=all > /dev/null 0.08s user 0.50s system 258% cpu 0.228 total
$ time git status --porcelain --untracked-files=all > /dev/null
git status --porcelain --untracked-files=all > /dev/null 0.62s user 0.37s system 121% cpu 0.818 total
To be honest I'd say it's still a little bleeding edge at the moment for too much focus, there's been big improvements, but I've found one or 2 now that still go through the ensure_full_index path unnnecesarily. Faster access to status/commit/add etc makes it a win for me, but it's not yet as clear cut as sparse checkouts and cone mode have been.
Sorry for reviving an old issue, but I thought it worth mentioning that with the latest version of git the sparse index feature feels a lot more complete, so my warning from the last comment that it might be too bleeding edge is now defunct.
The latest big improvement was to stash performance which went from around 4s without sparse index and 8s with it to 0.5s with the latest git version of sparse index, which is a massive improvement.
So if you did want to experiment with more integration with sparse checkouts and sparse index, I'd say its about ready for that.
Git has a new feature, sparse index, which allows significant performance improvements for sparse checkouts of large repositories. This has slowly taken effect over the last 3 versions, and now covers most common commands in git 2.35.
Sadly, it doesn't play very well with fork at the moment, which appears to be primarily because of usage of
git ls-files
within fork.git ls-files
, unlike most git commands on a worktree, needs to list every file in the repo to match past behaviour, even if those files are excluded from the sparse checkout, so calling it means expanding the sparse index into a full index. The git devs have now provided a new--sparse
argument togit ls-files
to list only files within the index.As can be seen, for my sparse worktree this knocks the cost of git ls-files from 5 second to basically nothing (yes, my repo is very large ☹️ )
This is an Instruments time-profile stack trace of the ls-files command run by fork, showing its the
ensure_full_index
function (which expands the sparse index) that causes the stall.Since fork appears to call
ls-files
a lot (every time it refreshes the file list, so basically whenever you look at it funny), this causes quite a lot of waiting for spinners.ls-files
was always slow for me, but it wasn't quite so bad before sparse-index (used to take about 2s iirc), because the index was already expanded, so there was no need spend time expanding it in memory in every call.Note that
ls-files --sparse
does now list directories that mark the sparse boundaries, which would need to be filtered from the output (not sure why git devs only provided this mode and not a mode that just lists sparse-included files), although given you already filter out non-sparse files somewhere between the internal command and the ui, so perhaps you'd already filter out the directories anyway.From
git help ls-files
So effectively the command would be something like: git ls-files --sparse | grep -E -v -e '/$' (ie to exclude any listed "files" that end in a /, since they are just folders from the sparse-index format)
This is new to 2.35, and sparse-index is not the standard setup (sparse checkouts are never going to be the standard scenario), so I understand if you don't want to prioritise this, but it would make fork perform much better for me, so I'd appreciate it if you got around to it.
Whilst there's probably some other commands that also trigger sparse index expansion (at least
git apply
when staging changes from a diff), I think ls-files is the only one used so heavily and so the only one that would provide a significant speed up (plus it seems pretty easy to fix, hopefully!).Thanks