go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.25k stars 5.43k forks source link

Support LFS purely over SSH protocol #17554

Open ibigbug opened 2 years ago

ibigbug commented 2 years ago

Feature Description

The current LFS related operations only using settings.AppURL as the endpoint:

https://github.com/go-gitea/gitea/blob/bc6df18fb35837b510dfa4daeec53fec32a55af7/services/lfs/server.go#L48

One scenario is I want to have server.ROOT_URL(settings.AppURL) to be external faced URL to browse the site.

And I want to have server. SSH_DOMAIN to be my internal domain to clone and push code.

But the current implementation only looks at AppURL(ROOT_URL), should SSH_DOMAIN be considered for LFS operations?

Screenshots

No response

lunny commented 2 years ago

Hm, git-lfs has supported SSH protocol I think. ref: https://github.com/git-lfs/git-lfs/pull/4446

ibigbug commented 2 years ago

good to know, but how to use it?

looks Gitea hasn't support it yet:

-> % ssh git@gitea.at.somewhere -p PORT git-lfs-transfer some/repo.git download

Gitea: Unknown git command
Gitea: Unknown git command
lunny commented 2 years ago

@ibigbug Could you change the title to support a pure ssh lfs protocol something like that?

Yes. Since that PR was merged recently, Gitea itself didn't support that at the moment. But it should fallback to use http/https protocol.

ibigbug commented 2 years ago

thanks @lunny

keen to see Gitea can make LFS over SSH work.

ibigbug commented 2 years ago

@lunny any updates?

lunny commented 2 years ago

@lunny any updates?

Nobody are working on this issue.

ibigbug commented 2 years ago

@lunny thanks for letting me know.

How to get someone to take a look at this?

fcharlie commented 2 years ago

In fact, it is not that difficult to implement git-lfs-transfer. We can write such a command based on rust/golang to support the git lfs pure ssh protocol.

bk2204 implements a great example: https://github.com/bk2204/scutiger

Inside our company, I have used golang to simulate the git-lfs-transfer command in the ssh server from this project, and it is currently running stably.

davama commented 1 year ago

This feature would be great! :+1:

tionis commented 6 months ago

@fcharlie did you end up using this project internally and did it work as expected or were there some complications?

fcharlie commented 6 months ago

@fcharlie did you end up using this project internally and did it work as expected or were there some complications?

Since the prune ssh protocol does not support OSS signature download, and AI needs to download a large number of large files, we disabled it for performance reasons.

Here is also an implementation reference for the SSH protocol: https://github.com/charmbracelet/git-lfs-transfer

algora-pbc[bot] commented 6 months ago

💎 $300 bounty • CommitGo, Inc.

Steps to solve:

  1. Start working: Comment /attempt #17554 with your implementation plan
  2. Submit work: Create a pull request including /claim #17554 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to go-gitea/gitea!

Add a bounty • Share on socials

Attempt Started (GMT+0) Solution
🔴 @jemiluv8 Mar 11, 2024, 9:52:04 AM WIP
🟢 @Sambit003 Apr 10, 2024, 4:13:13 PM WIP
🟢 @ConcurrentCrab Jun 21, 2024, 11:27:29 AM #31516
techknowlogick commented 6 months ago

^ that was me that created this bounty (trialing new bounty platform)

jemiluv8 commented 6 months ago

/attempt #17554

@techknowlogick, I've started look into this.

Algora profile Completed bounties Tech Active attempts Options
@jemiluv8 26 bounties from 9 projects
TypeScript, HTML,
Rust
Cancel attempt
Sambit003 commented 5 months ago

/attempt #17554

ConcurrentCrab commented 3 months ago

Hi, is this still being actively worked on or is it up for grabs?

Also just to be clear, integrating charm.sh's (pretty good looking) https://github.com/charmbracelet/git-lfs-transfer implementation into Gitea successfully such that all LFS functionality starts working over SSH would be an acceptable fix, correct? That is, we don't need a new, in-house implementation of the protocol?

jemiluv8 commented 3 months ago

@ConcurrentCrab, I've not been able to get far on this task so it is indeed up for grabs.

ConcurrentCrab commented 3 months ago

Thank you for the confirmation, @jemiluv8

I'm interested in trying my hand at this, so it'd be great if @techknowlogick or another project member could confirm the scope of the task, in regards to my above question.

ConcurrentCrab commented 3 months ago

/attempt #17554

ConcurrentCrab commented 3 months ago

Huh, this was more straightforward than I expected it to be. https://github.com/go-gitea/gitea/pull/31448

Very rough, needs cleanups:

  • the logic is a mess of ifs and elses, probably can be refactored to not be confusing anymore
  • needs documentation probably?

For the question I posed above, I assumed a dependency on an external implementation was fine (please confirm?), so all it requires is to have an implementation of git-lfs-transfer (e.g. tested with Charm.sh's) in path such that Gitea or git can find it (this is similar to how Gitea already expects git-upload-pack, git-receive-pack and co. from the git package to be installed, only this unfortunately isn't part of upstream git or git-lfs). There should probably be documentation to indicate this for server admins (which impl should be recommended? it seems the "blessed" impl according to the git-lfs team is https://github.com/bk2204/scutiger).

To the best of my knowledge, this preserves the security model, since the ServCommand API logic cares mainly about the AccessMode, which is derived correctly. Still would be nice to confirm from someone with more knowledge of the internal permissions architecture.

How do I know it works?

if you ran GIT_TRACE=1 git lfs push origin --all (or any other lfs-calling command) earlier:

12:06:04.918475 trace git-lfs: attempting pure SSH protocol connection
12:06:04.918488 trace git-lfs: spawning pure SSH connection
12:06:04.918526 trace git-lfs: run_command: ssh -oControlMaster=yes -oControlPath=[...] [...] git-lfs-transfer [...] upload
12:06:04.918675 trace git-lfs: exec: ssh '-oControlMaster=yes' '-oControlPath=[...]' '[...]' 'git-lfs-transfer [...] upload'
12:06:05.285320 trace git-lfs: pure SSH connection successful
12:06:05.285335 trace git-lfs: pure SSH protocol connection failed: Unable to negotiate version with remote side (unable to read capabilities): unexpected EOF
12:06:05.285436 trace git-lfs: run_command: ssh [...] git-lfs-authenticate [...] upload

where a git-lfs-authenticate call indicates a fallback to the HTTP protocol.

Now:

12:31:07.837098 trace git-lfs: attempting pure SSH protocol connection
12:31:07.837108 trace git-lfs: spawning pure SSH connection
12:31:07.837155 trace git-lfs: run_command: ssh -oControlMaster=yes -oControlPath=[...] [...] git-lfs-transfer [...] upload
12:31:07.837277 trace git-lfs: exec: ssh '-oControlMaster=yes' '-oControlPath=[...]' '[...]' 'git-lfs-transfer [...] upload'
12:31:08.217219 trace git-lfs: pure SSH connection successful
12:31:08.217443 trace git-lfs: Upload refs [] to remote origin
techknowlogick commented 3 months ago

Hi @ConcurrentCrab this is indeed available, and we'd love for you to work on it. Thanks for submitting a WIP PR. In terms of scope, it's "users can interact with LFS using SSH (using either the built-in ssh server, or the integration with opensshd)". Adding the external dep on charm.sh is a-ok. Documentation would be appreciated but not necessary for completing the bounty (we are in a transition phase with documentation, so I don't want to add extra work to your plate by sorting out where to contribute it), so if you are inclined, even a comment in your PR would be helpful. If I've missed any of your questions, or you have more, please don't hesitate to ping :)

ConcurrentCrab commented 3 months ago

Hi @techknowlogick,

Adding the external dep on charm.sh is a-ok.

Thanks for clarifying that. To be crystal clear, this is not a build-time dependency, as in pulling in their libraries. This is a run-time dependency on the binary being present in the environment (any compliant implementation would do), just like we're already depending on the git package being installed for the git-upload-pack/git-receive-pack binaries. Ofc, it's still an "optional" dependency, and if it isn't present we simply fall back onto the HTTP protocol, so we're no worse off than where we started.

As I mentioned, the patch in the PR already makes pure SSH LFS sessions work, according to my experiments I outlined above. But this is very much the minimum-changes-required version of this patch, conceivably can be termed a 'hack'. I think it'd be worth making some refactors to the logic in that handler. There's already quite a bit of convoluted logic in there, so I'd feel bad leaving it there with even more surprise ifs-and-elses :)

And I'll add all the relevant useful information into the commits, as you suggested.

Regarding the security model I think we're mostly fine, except at one point it does seem to care about the "verb": https://github.com/go-gitea/gitea/blob/621e1ff9c9ec04ea8e6d68cd8e38bb5734f29bdc/routers/private/serv.go#L140 That line seems to be from this commit fixing a bug related to the upload command: https://github.com/go-gitea/gitea/commit/95013fde60748c425eb910dcab5d1fdd1c89ae18

That seems important. I'll be looking more into what exactly that might mean, as the "upload" mode of our new command should probably be on that list of exceptions too.

ConcurrentCrab commented 3 months ago

Whoops, called it too early ;). The network transfer indeed is working, but the binary isn't placing the objects where Gitea expects them to be. Huh. Going to look into that. Meanwhile, experimenting with the refactor on another branch: https://github.com/ConcurrentCrab/gitea/commits/lfs-ssh-2/

ConcurrentCrab commented 3 months ago

Ah. It seems both Charm and Scutiger store the LFS assets in the <repo_dir>/lfs, while Gitea expects them in a separate data directory... which seems to be common across all the repos? That seems like a strange choice, since it'll lead to both higher chances of collision, and slower performance as the directories fill up. Anyway, it would seem the paths need to be changed in git-transfer-lfs then.

lafriks commented 3 months ago

Gitea stores all LFS blobs in same directory so that same files in multiple repositories could reuse them (especially important for forking), also need to keep in mind that Gitea stores all LFS references to repositories that specific blob is used in database. LFS storage can be not only filesystem but also S3

ConcurrentCrab commented 3 months ago

Hi @lafriks,

Thanks for sharing this information.

Gitea stores all LFS blobs in same directory so that same files in multiple repositories could reuse them (especially important for forking)

Ah, that sounds reasonable.

also need to keep in mind that Gitea stores all LFS references to repositories that specific blob is used in database. LFS storage can be not only filesystem but also S3

I see. That does significantly complicate things. This certainly doesn't sound like something that could be done by an out-of-process client anymore, not the db stuff and definitely not the S3 stuff. The git protocols take special care to avoid stuff like that (mostly by having a single source of truth), so that multiple protocols like SSH, "dumb" HTTP, "smart" HTTP, etc. can worth together without a hitch. And that repos on the server are just normal bare repos, so that representations are symmetric on both server and client. This sort of makes that moot.

So how would you suggest approaching this? I would think the simplest way would be throwing away the out-of-process client, and implementing an in-process reader/writer that calls the same functions the HTTP LFS API does. But not being aware of the architecture of the program, I'm open to suggestions.

lafriks commented 3 months ago

I think that in-process and internal API to reuse HTTP logic would be the way to go imho

ConcurrentCrab commented 3 months ago

Alright, mostly done with the rote work. What I ended up doing was vendoring the transport package from the charm.sh library (earlier I tried just importing it but the certain differences between the file-based and Gitea API made that unfeasible), modifying it a bit, and adding a "backend" that proxies to the Gitea internal store.

Just need to figure out why transfers are throwing 500s :)

ConcurrentCrab commented 2 months ago

Aaaand that should be it. Pushes and clones all seem to work properly, and metadata objects are registered like they should be. I think support for Pure SSH LFS is complete :)

Marking draft PR as final now.

ConcurrentCrab commented 2 months ago

Continued in https://github.com/go-gitea/gitea/pull/31516

I guess I had to resubmit with a new PR for the algora bot to see.

ConcurrentCrab commented 2 months ago

Finally get everything working, and a git-lfs bug pops up :\ https://github.com/git-lfs/git-lfs/pull/5816

I think the PR is largely complete, but I would... suggest holding off on merging it until a fixed version of git-lfs is released and is in common use, due to the nature of the bug (in the presence of pure LFS SSH support, it degrades a normal push into a 2-minute wait).

ConcurrentCrab commented 2 months ago

You can test it but obvs throw a fixed build of git-lfs on your PATH, unless you want to wait 2 mins after every push :P

either that or set lfs.ssh.automultiplex to false in git config.

ConcurrentCrab commented 1 month ago

Hi @techknowlogick,

It'd be nice to have some kind of a response to the PR, even if that response is "hi, no, haven't gotten to it yet but will soon".

It doesn't really feel nice to be ghosted for a month after having done all this work (up to and including fixing bugs in the LFS client) :)

techknowlogick commented 1 month ago

Hi, @ConcurrentCrab. I'm so sorry for missing the notifications and leaving you hanging. I just tested a build of your PR, and ran into the long hang from git-lfs that you described. I don't think that's a blocker to get this merged, but perhaps LFS over SSH support could be disabled by default in the config, and then in documentation I could add something around disabling automultiplex prior to enabling it?