Server-side of fetch/pull

Byron commented 2 years ago

What would be needed to allow a server to send a pack?

Tasks

Server fetch/pull (server to client)

git-odb

The below is a very early draft - it would be better to study existing implementations first to get a better overview on what (not) to do. This one starts with the fun part to allow writing tests early and experiment with different diff algorithms and potentially their performance.

[x] generate a pack from objects received by an iterator producing (see issue)
- [x] base objects only
- [x] re-use existing delta objects
- [x] A mechanism to declare some bases to be 'out of pack' for thin pack support
[x] Iterator to feed pack generation efficiently
[x] pack creation
git-transport

Certainly needs more research, but roughly…

[ ] Server side accept()
- [ ] http(s)
- [ ] ssh
- [ ] ~~daemon~~ probaby only used in testing, and we might implement it if it's useful for us as well
git-protocol
- [ ] Server side chatter to negotiate a pack for
- [ ] protocol V2
- [ ] protocol V1 (probably not worth it, let's see)
gix-serve

Probably more like a toy at first merely for testing operation against various git clients.

[ ] A server able to answer via
- [ ] http(s)
- [ ] file protocol (or remote invocation via SSH)

Notes

Could something like gittorrent be build using the plumbing of the server? Is it desirable even? Can there be some differentiation to allow custom transport layers easily?

masklinn commented 2 years ago

Server side accept()

http(s) ssh

Just to be clear, this doesn't mean reimplementing things from accept() upwards, but supporting these as inputs using existing libraries, similar to what git-transport currently does on the client side?

Byron commented 2 years ago

If I understand the question correctly, the answer is yes. The client of git-transport could be handled with what's provided by accept(), while the actual interaction patterns would be abstracted in git-protocol.

ghost commented 1 year ago

This is very desirable to be able to quickly implement your own self-hosted git server (existing ones in other langs require too much memory and therefore are a poor fit for cheap vms).

vlad-ivanov-name commented 1 year ago

I think it might be enough to accept AsyncWrite and AsyncRead for transport without worrying too much where those come from. Or, at the very least, the whole HTTP and SSH plumbing, providing alternatives to tools like git-http-backend and git-upload/receive-pack, should be separate.

willstott101 commented 7 months ago

I've been experimenting with writing a server (in a private repo so far). My interest here is in using parts of gix for the protocol, but leaving the storage up to pluggable backends. I'm very curious about git-on-db, and git-on-object-storage, and git-on-kv-store, and mixtures of the three. Step 1 of this is to have a clean working HTTP git server written with gitoxide, using it's filesystem access as the only storage backend.

I think I have ls-refs working and I'm starting to investigate fetch. The protocol is very command-based, and in HTTP AsyncRead & AsyncWrite can't really exist at the same time. Regardless we'd want to leave it up to server implementations to authenticate during the connection, find the relevant repo, authorize the parsed command, then hand back to gitoxide to respond.

So for me a sketch might look like this:

Server parses HTTP headers & path to verify protocol v2, authenticate
async fn read_command(source: AsyncRead) -> Result<Command>
Server authorizes the command and writes response headers
async fn execute_command(cmd: Command, repo: ..., dest: AsyncWrite) -> Result<()>

It's possible that a sufficiently advanced AsyncWrite could have a buffer and be appended to the HTTP response after writing the headers (in-case the command parsing wanted to reply with errors in packet-line format?). I have also spent no time so far learning about the SSH transport. But those are my thoughts so far.

I also have some questions about packfile construction. I haven't spent a great deal of time investigating yet but I currently haven't found much in the existing gitoxide codebase. Especially relating to resolving packfiles between two peers, but there must be some logic in gix for this somewhere. I'll keep looking, and if anyone is particularly interested in collaborating let me know and I can un-private the repo, I'm just quite enjoying the messy private sandpit atm.

Byron commented 7 months ago

That's great news! Please be sure to let us know here once the repo goes public!

Regarding the sketch, the server would probably also reject V1 requests. But then, read_command() would read the command itself and arguments to it, but I wonder if that's not a liability as it might read more than it has to given that the server might reject the command itself. That probably also depends on what information the server wants to use to reject the command, but I can imagine that a step-wise process would be better. Read the command-name, then read its arguments, but maybe that is implied.

In general, it's probably OK to just cobble it together and then refactor.

pack creation

Regarding packs, you can try gix free pack create and see from there how the API works. In general, packs can start streaming quickly, but they won't be the most efficient as they don't delta-compress on the fly. But that might even be a beneficial trade-off at first.

Transports

You can check the client-side (use gix --trace clone ssh://… to see how ssh is typically invoked. From there you will see that it definitely requires its own binary, but that should then be the easiest implementation as it's the same as gix --trace clone file://local/path. However, getting the server-side SSH server going is probably it's own set of problems to solve.

Byron / gitoxide