GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
9.17k stars 315 forks source link

Server-side of fetch/pull #307

Open Byron opened 2 years ago

Byron commented 2 years ago

What would be needed to allow a server to send a pack?

Tasks

Server fetch/pull (server to client)

The below is a very early draft - it would be better to study existing implementations first to get a better overview on what (not) to do. This one starts with the fun part to allow writing tests early and experiment with different diff algorithms and potentially their performance.

Certainly needs more research, but roughly…

Probably more like a toy at first merely for testing operation against various git clients.

Notes

masklinn commented 2 years ago

Server side accept()

http(s) ssh

Just to be clear, this doesn't mean reimplementing things from accept() upwards, but supporting these as inputs using existing libraries, similar to what git-transport currently does on the client side?

Byron commented 2 years ago

If I understand the question correctly, the answer is yes. The client of git-transport could be handled with what's provided by accept(), while the actual interaction patterns would be abstracted in git-protocol.

ghost commented 2 years ago

This is very desirable to be able to quickly implement your own self-hosted git server (existing ones in other langs require too much memory and therefore are a poor fit for cheap vms).

vlad-ivanov-name commented 1 year ago

I think it might be enough to accept AsyncWrite and AsyncRead for transport without worrying too much where those come from. Or, at the very least, the whole HTTP and SSH plumbing, providing alternatives to tools like git-http-backend and git-upload/receive-pack, should be separate.

willstott101 commented 11 months ago

I've been experimenting with writing a server (in a private repo so far). My interest here is in using parts of gix for the protocol, but leaving the storage up to pluggable backends. I'm very curious about git-on-db, and git-on-object-storage, and git-on-kv-store, and mixtures of the three. Step 1 of this is to have a clean working HTTP git server written with gitoxide, using it's filesystem access as the only storage backend.

I think I have ls-refs working and I'm starting to investigate fetch. The protocol is very command-based, and in HTTP AsyncRead & AsyncWrite can't really exist at the same time. Regardless we'd want to leave it up to server implementations to authenticate during the connection, find the relevant repo, authorize the parsed command, then hand back to gitoxide to respond.

So for me a sketch might look like this:

It's possible that a sufficiently advanced AsyncWrite could have a buffer and be appended to the HTTP response after writing the headers (in-case the command parsing wanted to reply with errors in packet-line format?). I have also spent no time so far learning about the SSH transport. But those are my thoughts so far.

I also have some questions about packfile construction. I haven't spent a great deal of time investigating yet but I currently haven't found much in the existing gitoxide codebase. Especially relating to resolving packfiles between two peers, but there must be some logic in gix for this somewhere. I'll keep looking, and if anyone is particularly interested in collaborating let me know and I can un-private the repo, I'm just quite enjoying the messy private sandpit atm.

Byron commented 11 months ago

That's great news! Please be sure to let us know here once the repo goes public!

Regarding the sketch, the server would probably also reject V1 requests. But then, read_command() would read the command itself and arguments to it, but I wonder if that's not a liability as it might read more than it has to given that the server might reject the command itself. That probably also depends on what information the server wants to use to reject the command, but I can imagine that a step-wise process would be better. Read the command-name, then read its arguments, but maybe that is implied.

In general, it's probably OK to just cobble it together and then refactor.

pack creation

Regarding packs, you can try gix free pack create and see from there how the API works. In general, packs can start streaming quickly, but they won't be the most efficient as they don't delta-compress on the fly. But that might even be a beneficial trade-off at first.

Transports

You can check the client-side (use gix --trace clone ssh://… to see how ssh is typically invoked. From there you will see that it definitely requires its own binary, but that should then be the easiest implementation as it's the same as gix --trace clone file://local/path. However, getting the server-side SSH server going is probably it's own set of problems to solve.