Open ibigbug opened 2 years ago
Hm, git-lfs has supported SSH protocol I think. ref: https://github.com/git-lfs/git-lfs/pull/4446
good to know, but how to use it?
looks Gitea hasn't support it yet:
-> % ssh git@gitea.at.somewhere -p PORT git-lfs-transfer some/repo.git download
Gitea: Unknown git command
Gitea: Unknown git command
@ibigbug Could you change the title to support a pure ssh lfs protocol something like that?
Yes. Since that PR was merged recently, Gitea itself didn't support that at the moment. But it should fallback to use http/https protocol.
thanks @lunny
keen to see Gitea can make LFS over SSH work.
@lunny any updates?
@lunny any updates?
Nobody are working on this issue.
@lunny thanks for letting me know.
How to get someone to take a look at this?
In fact, it is not that difficult to implement git-lfs-transfer. We can write such a command based on rust/golang to support the git lfs pure ssh protocol.
bk2204 implements a great example: https://github.com/bk2204/scutiger
Inside our company, I have used golang to simulate the git-lfs-transfer command in the ssh server from this project, and it is currently running stably.
This feature would be great! :+1:
@fcharlie did you end up using this project internally and did it work as expected or were there some complications?
@fcharlie did you end up using this project internally and did it work as expected or were there some complications?
Since the prune ssh protocol does not support OSS signature download, and AI needs to download a large number of large files, we disabled it for performance reasons.
Here is also an implementation reference for the SSH protocol: https://github.com/charmbracelet/git-lfs-transfer
/attempt #17554
with your implementation plan/claim #17554
in the PR body to claim the bountyThank you for contributing to go-gitea/gitea!
Add a bounty • Share on socials
Attempt | Started (GMT+0) | Solution |
---|---|---|
🔴 @jemiluv8 | Mar 11, 2024, 9:52:04 AM | WIP |
🟢 @Sambit003 | Apr 10, 2024, 4:13:13 PM | WIP |
🟢 @ConcurrentCrab | Jun 21, 2024, 11:27:29 AM | #31516 |
^ that was me that created this bounty (trialing new bounty platform)
/attempt #17554
@techknowlogick, I've started look into this.
Algora profile | Completed bounties | Tech | Active attempts | Options |
---|---|---|---|---|
@jemiluv8 | 26 bounties from 9 projects | TypeScript, HTML, Rust |
Cancel attempt |
/attempt #17554
Hi, is this still being actively worked on or is it up for grabs?
Also just to be clear, integrating charm.sh's (pretty good looking) https://github.com/charmbracelet/git-lfs-transfer implementation into Gitea successfully such that all LFS functionality starts working over SSH would be an acceptable fix, correct? That is, we don't need a new, in-house implementation of the protocol?
@ConcurrentCrab, I've not been able to get far on this task so it is indeed up for grabs.
Thank you for the confirmation, @jemiluv8
I'm interested in trying my hand at this, so it'd be great if @techknowlogick or another project member could confirm the scope of the task, in regards to my above question.
/attempt #17554
Huh, this was more straightforward than I expected it to be. https://github.com/go-gitea/gitea/pull/31448
Very rough, needs cleanups:
For the question I posed above, I assumed a dependency on an external implementation was fine (please confirm?), so all it requires is to have an implementation of git-lfs-transfer
(e.g. tested with Charm.sh's) in path such that Gitea or git can find it (this is similar to how Gitea already expects git-upload-pack
, git-receive-pack
and co. from the git package to be installed, only this unfortunately isn't part of upstream git or git-lfs). There should probably be documentation to indicate this for server admins (which impl should be recommended? it seems the "blessed" impl according to the git-lfs team is https://github.com/bk2204/scutiger).
To the best of my knowledge, this preserves the security model, since the ServCommand
API logic cares mainly about the AccessMode
, which is derived correctly. Still would be nice to confirm from someone with more knowledge of the internal permissions architecture.
How do I know it works?
if you ran GIT_TRACE=1 git lfs push origin --all
(or any other lfs-calling command) earlier:
12:06:04.918475 trace git-lfs: attempting pure SSH protocol connection
12:06:04.918488 trace git-lfs: spawning pure SSH connection
12:06:04.918526 trace git-lfs: run_command: ssh -oControlMaster=yes -oControlPath=[...] [...] git-lfs-transfer [...] upload
12:06:04.918675 trace git-lfs: exec: ssh '-oControlMaster=yes' '-oControlPath=[...]' '[...]' 'git-lfs-transfer [...] upload'
12:06:05.285320 trace git-lfs: pure SSH connection successful
12:06:05.285335 trace git-lfs: pure SSH protocol connection failed: Unable to negotiate version with remote side (unable to read capabilities): unexpected EOF
12:06:05.285436 trace git-lfs: run_command: ssh [...] git-lfs-authenticate [...] upload
where a git-lfs-authenticate
call indicates a fallback to the HTTP protocol.
Now:
12:31:07.837098 trace git-lfs: attempting pure SSH protocol connection
12:31:07.837108 trace git-lfs: spawning pure SSH connection
12:31:07.837155 trace git-lfs: run_command: ssh -oControlMaster=yes -oControlPath=[...] [...] git-lfs-transfer [...] upload
12:31:07.837277 trace git-lfs: exec: ssh '-oControlMaster=yes' '-oControlPath=[...]' '[...]' 'git-lfs-transfer [...] upload'
12:31:08.217219 trace git-lfs: pure SSH connection successful
12:31:08.217443 trace git-lfs: Upload refs [] to remote origin
Hi @ConcurrentCrab this is indeed available, and we'd love for you to work on it. Thanks for submitting a WIP PR. In terms of scope, it's "users can interact with LFS using SSH (using either the built-in ssh server, or the integration with opensshd)". Adding the external dep on charm.sh is a-ok. Documentation would be appreciated but not necessary for completing the bounty (we are in a transition phase with documentation, so I don't want to add extra work to your plate by sorting out where to contribute it), so if you are inclined, even a comment in your PR would be helpful. If I've missed any of your questions, or you have more, please don't hesitate to ping :)
Hi @techknowlogick,
Adding the external dep on charm.sh is a-ok.
Thanks for clarifying that. To be crystal clear, this is not a build-time dependency, as in pulling in their libraries. This is a run-time dependency on the binary being present in the environment (any compliant implementation would do), just like we're already depending on the git package being installed for the git-upload-pack
/git-receive-pack
binaries. Ofc, it's still an "optional" dependency, and if it isn't present we simply fall back onto the HTTP protocol, so we're no worse off than where we started.
As I mentioned, the patch in the PR already makes pure SSH LFS sessions work, according to my experiments I outlined above. But this is very much the minimum-changes-required version of this patch, conceivably can be termed a 'hack'. I think it'd be worth making some refactors to the logic in that handler. There's already quite a bit of convoluted logic in there, so I'd feel bad leaving it there with even more surprise ifs-and-elses :)
And I'll add all the relevant useful information into the commits, as you suggested.
Regarding the security model I think we're mostly fine, except at one point it does seem to care about the "verb": https://github.com/go-gitea/gitea/blob/621e1ff9c9ec04ea8e6d68cd8e38bb5734f29bdc/routers/private/serv.go#L140 That line seems to be from this commit fixing a bug related to the upload command: https://github.com/go-gitea/gitea/commit/95013fde60748c425eb910dcab5d1fdd1c89ae18
That seems important. I'll be looking more into what exactly that might mean, as the "upload" mode of our new command should probably be on that list of exceptions too.
Whoops, called it too early ;). The network transfer indeed is working, but the binary isn't placing the objects where Gitea expects them to be. Huh. Going to look into that. Meanwhile, experimenting with the refactor on another branch: https://github.com/ConcurrentCrab/gitea/commits/lfs-ssh-2/
Ah. It seems both Charm and Scutiger store the LFS assets in the <repo_dir>/lfs
, while Gitea expects them in a separate data directory... which seems to be common across all the repos? That seems like a strange choice, since it'll lead to both higher chances of collision, and slower performance as the directories fill up. Anyway, it would seem the paths need to be changed in git-transfer-lfs
then.
Gitea stores all LFS blobs in same directory so that same files in multiple repositories could reuse them (especially important for forking), also need to keep in mind that Gitea stores all LFS references to repositories that specific blob is used in database. LFS storage can be not only filesystem but also S3
Hi @lafriks,
Thanks for sharing this information.
Gitea stores all LFS blobs in same directory so that same files in multiple repositories could reuse them (especially important for forking)
Ah, that sounds reasonable.
also need to keep in mind that Gitea stores all LFS references to repositories that specific blob is used in database. LFS storage can be not only filesystem but also S3
I see. That does significantly complicate things. This certainly doesn't sound like something that could be done by an out-of-process client anymore, not the db stuff and definitely not the S3 stuff. The git protocols take special care to avoid stuff like that (mostly by having a single source of truth), so that multiple protocols like SSH, "dumb" HTTP, "smart" HTTP, etc. can worth together without a hitch. And that repos on the server are just normal bare repos, so that representations are symmetric on both server and client. This sort of makes that moot.
So how would you suggest approaching this? I would think the simplest way would be throwing away the out-of-process client, and implementing an in-process reader/writer that calls the same functions the HTTP LFS API does. But not being aware of the architecture of the program, I'm open to suggestions.
I think that in-process and internal API to reuse HTTP logic would be the way to go imho
Alright, mostly done with the rote work. What I ended up doing was vendoring the transport package from the charm.sh library (earlier I tried just importing it but the certain differences between the file-based and Gitea API made that unfeasible), modifying it a bit, and adding a "backend" that proxies to the Gitea internal store.
Just need to figure out why transfers are throwing 500s :)
Aaaand that should be it. Pushes and clones all seem to work properly, and metadata objects are registered like they should be. I think support for Pure SSH LFS is complete :)
Marking draft PR as final now.
Continued in https://github.com/go-gitea/gitea/pull/31516
I guess I had to resubmit with a new PR for the algora bot to see.
Finally get everything working, and a git-lfs
bug pops up :\
https://github.com/git-lfs/git-lfs/pull/5816
I think the PR is largely complete, but I would... suggest holding off on merging it until a fixed version of git-lfs
is released and is in common use, due to the nature of the bug (in the presence of pure LFS SSH support, it degrades a normal push into a 2-minute wait).
You can test it but obvs throw a fixed build of git-lfs
on your PATH, unless you want to wait 2 mins after every push :P
either that or set lfs.ssh.automultiplex
to false
in git config.
Hi @techknowlogick,
It'd be nice to have some kind of a response to the PR, even if that response is "hi, no, haven't gotten to it yet but will soon".
It doesn't really feel nice to be ghosted for a month after having done all this work (up to and including fixing bugs in the LFS client) :)
Hi, @ConcurrentCrab. I'm so sorry for missing the notifications and leaving you hanging. I just tested a build of your PR, and ran into the long hang from git-lfs that you described. I don't think that's a blocker to get this merged, but perhaps LFS over SSH support could be disabled by default in the config, and then in documentation I could add something around disabling automultiplex prior to enabling it?
Feature Description
The current LFS related operations only using
settings.AppURL
as the endpoint:https://github.com/go-gitea/gitea/blob/bc6df18fb35837b510dfa4daeec53fec32a55af7/services/lfs/server.go#L48
One scenario is I want to have server.ROOT_URL(
settings.AppURL
) to be external faced URL to browse the site.And I want to have
server. SSH_DOMAIN
to be my internal domain to clone and push code.But the current implementation only looks at
AppURL
(ROOT_URL
), shouldSSH_DOMAIN
be considered for LFS operations?Screenshots
No response