cryptix / git-remote-ipfs

git transport helper for ipfs
MIT License
227 stars 25 forks source link

get basic cloning working #1

Closed cryptix closed 9 years ago

cryptix commented 9 years ago

tl;dr: the initial git stdio interface is there but i need some feedback on git internals

09:55 < cryptix> jbenet: i fell into the trap of starting to implement a protocol helper for git yesterday. but could use some insight from somebdoy more familiar with the 
                 interface.. the one man document i found was only helpful so far
09:55 < cryptix> https://github.com/cryptix/git-remote-ipfs/blob/master/main.go
09:55 < cryptix> ie git clone ipfs://$somepath
09:56 < cryptix> maybe chris or whyrusleeping can shed some light on my questions
09:56 <@jbenet> cryptix: very cool. im not super familiar but maybe i can take a look?
09:57 < cryptix> sure. so far i only delt with the 'fetch' capability (thats all gittorrent implements), i can fetch a bare repo under ipfs://$path already and answer to 
                 the 'list' command, wich wants a list of hashes and refs and than requests 'fetch $sha1 HEAD' (basically all refs afterwards)
09:58 < cryptix> but i'm not sure on the format it expects than
09:58 < cryptix> like, how to answer that 'fetch $hash $ref' command is beyond me from the docs
10:00 <@jbenet> cryptix: links pls?
10:00 < cryptix> i basically wanted a 'post-hook' that publishes commits to ipfs since dtnconf. whyrusleeping git-ipfs-rehost already does a lot of that
10:01 < cryptix> jbenet: sorry, its in the main.go too
10:01 < cryptix> https://git-scm.com/docs/gitremote-helpers
10:01 <@jbenet> also maybe we want ipfs://ipfs/<hash> and ipfs://ipns/<hash> -- otherwise we'd need two protocol handlers
10:02 <@jbenet> though not sure, i'm as annoyed by "ipfs://ipfs" as anyone :)
10:02 < cryptix> yea.. once the basics are done, having two helpers for ipfs and ipns is pretty trivial
10:03 < cryptix> its just that it wants $proto://$path or $proto::$path which really annys me but meh..
10:05 < cryptix> btw git clone http://$gateway/$path from git-ipfs-rehost already works - i just wanted nicer integration. someday you could have a 'git push' capable ipns 
                 remote for instance
10:06 <@jbenet> yeah exactly, i think this is definitely wanted
10:06 <@jbenet> im looking at the git source to find remote impls
10:06 <@jbenet> so far i've found: https://github.com/git/git/tree/77bd3ea9f54f1584147b594abc04c26ca516d987/contrib/persistent-https
10:07 < cryptix> there is git-remote-testgit but it only explains the stdin/out interface how git and the helper are talking
10:07 <@jbenet> the git source assumes so much is installed-- there are python scripts
10:07 < cryptix> https://github.com/git/git/blob/master/git-remote-testgit.sh
10:08 <@jbenet> https://github.com/felipec/git-remote-hg 
10:08 < cryptix> than there is https://github.com/cjb/GitTorrent/blob/master/git-remote-gittorrent ofc
10:09 <@jbenet> yep -- may want to ask cjb when he's online. (nyc, so should be around in a few hours)
10:09 < cryptix> afaict it directly fetches the $sha1 hashes from the 'fetch $sha1 $refName' command requests
10:10 < cryptix> maybe we could have another git-ipfs-rehost that unpacks the commits in a way that we can directly request those sha1 hahses from ipfs but yea.. thats 
                 where my git understanding gets muddy :)
10:10 <@jbenet> finding "fetch" in https://github.com/felipec/git-remote-hg/blob/master/git-remote-hg isnt very promising
10:11 <@jbenet> oh yeah that's trickt
10:11 <@jbenet> because we re-wrap all our objects with our merkledag format
10:11 <@jbenet> so the graph changes a bit
10:11 <@jbenet> what we could do is fetch the objects and look into them
10:12 <@jbenet> or have an "import git graph" thing that creates objects where link _names_ are the git sha1 hashes
10:12 <@jbenet> so we'd have mappings like $sha1 : $ipfsmultihash
10:12 < cryptix> yup that sounds promising
10:13 <@jbenet> the git-ipfs-rehost is a great hack because it leverages the fact that git repos + the http transport use unix files :) -- but this protocol is more lower 
                level
10:14 <@jbenet> ok so-- can you help me trace a full request here? like what does git ask from our handler?
10:14 < cryptix> yup - i guess you could also get away with just dumping the bare repo in the .git dir but i think its better to follow it's rules :)
10:15 <@jbenet> (may be useful to write it out as a set of pseudocode function calls in a gist)
10:15 < cryptix> jbenet: sure - lets be ipfs agonistic for a second
10:15 <@jbenet> yep
10:17 < cryptix> jbenet: https://etherpad.mozilla.org/nFW7hausSF
10:39 <@jbenet> hey chriscool if you're around may want to take a look at this etherpad \o

cc @jbenet @whyrusleeping @chriscool @cjb

jbenet commented 9 years ago

I think we can probably read the idx files in go. surely someone's written a format reader.

jbenet commented 9 years ago

answering fetch "list"

List the refs available at .git/HEAD and .git/refs/*

(anything else?

answering "fetch $sha1 $ref"

- to fetch $sha1, we try either:
  - look for it in ".git/objects/substr($sha1, 0, 2)/substr($sha, 2)"
     - if found, download it and put it in place. (there may be a command for this)
     - done \o/
  - look for it in packfiles by fetching ".git/objects/pack/*.idx" and looking at each idx with
    cat <idx> | git show-index  (alternatively can learn to read the format in go)
     - if found in an <idx>, download the relevant .pack file, and feed it into 
       `git index-pack --stdin --fix-thin` which will put it into place.
     - done \o/
   - else not found <o>

Also:

<cjb> instead of calling index-pack, you could call git unpack-objects, both work
cryptix commented 9 years ago

4f3e24cb11bdcd90dc20b8f362d08ffdd2ec860b implements both of @jbenet's purposed techniques to answer fetch commands and you get a working git repo in the end.

cryptix commented 9 years ago

Sorry, I got carried away. I only implemented the 2nd (unpack from packed objects).

I need a test repo for the first one.

chriscool commented 9 years ago

@cryptix you can get a repo with unpack objects from a repo with packed objects by calling git unpack-objects <PACKFILE where PACKFILE is a pack file that has been moved away from its original location (see git help unpack-objects).

cryptix commented 9 years ago

I'm having some problem with the first method, basically I don't want to implement parsing the index to see which other objects are needed to reconstruct a commit.

Maybe I'm just missing something but I'll leave it as an exercise for another fellow hacker for now and look into pushing.

Some more 'notes':

15:07 <@jbenet> cryptix: have you worked on that git protocol hack?
15:07 <@jbenet> what was missing on it?
15:08 < cryptix> jbenet: i got derailed on the '2nd method to fetch objects'
15:08 <@jbenet> (as a community, we should endeavor to finish the many hacks we have in-flight instead of context switching so much)
15:08 <@jbenet> what was the 2nd method to fetch objects?
15:08 < cryptix> i need to get back into it but you basically need to handle the dependency graph of unpacked objects.. which is quite a mouthfull...
15:09 < cryptix> jbenet: packed vs unpacked objects
15:09 < cryptix> i think i'll scratch that stuff until somebod needs it
15:09 < cryptix> and focus on pushing
15:09 < cryptix> otherwise i need a libgit to handle parsing the objects trees
15:10 < cryptix> cloining from packed is easy and works
15:10 < cryptix> (re meta: totally.)
15:11 <@whyrusleeping> jbenet: i updated rabin
15:11 <@whyrusleeping> btw
15:14 <@jbenet> cryptix https://github.com/libgit2/git2go ?
15:14 <@jbenet> whyrusleeping cool i'll CR in a bit
15:14 <@jbenet> cryptix: i need it :D
15:14 < cryptix> jbenet: yea.. i hope not. those are cgo libgit2 bindings
15:15 < cryptix> jbenet: i think i can pull of push without full fledged git bindings
15:15 <@jbenet> reming me why cgo matters?
15:16 < cryptix> well.. i fear it does all the os.Open stuff in c land
15:16 < cryptix> thus making it hard to port to vfs/ipfs
15:16 < cryptix> if needed we can go through >clone to /tmp and work there> back but id like to make it sexy :))
15:17 < cryptix> and again. i dont think this 2nd method for clone is actually needed if with do the ipfs-git-rehost dance with packed objects for now
15:17 <@jbenet> yeah though non-existent is worse
15:17 < cryptix> if it makes more sense (dedup wise) to store them unpacked id like to discuss that with somebody more familiar with the process
15:18 <@whyrusleeping> implementing the git object format stuff is pretty simple
15:18 < cryptix> or hand it over all together
15:18 <@whyrusleeping> linus wrote some pretty basic formats
15:18 < cryptix> yea the pack format sure
15:19 < cryptix> but then it goes into 'client ask s for hash x' you can just give it the object for that hash but to reconstruct the commit you need all the objects that 
                 it points to, too
15:19 < cryptix> thus you need an index of how they all fit together
15:19 < cryptix> basically you end up doing all the work that a packed repo already did for you
15:19 < cryptix> and i dont want to duplicate that right now
15:20 < cryptix> and again: not necessary for push at all
cryptix commented 9 years ago

My last assertion was wrong. For the 'unpacked' repo, the root commit object, unfolds like a tree (DAGs, fuck yea!). Just need to implement basic git object parsing.

chriscool commented 9 years ago

@cryptix did you have a look at: https://github.com/ChimeraCoder/gitgo/

cryptix commented 9 years ago

Thansk @chriscool :) I implemented the loose object decoding myself in exp/git, ChimeraCoder's version assumes a local repo, I might take a look at it once I need to parse packs myself.

(closed as of 35d7142373ce1ff6cf9e87ef1bf7356d33c761aa)