Possible git usage - Githubissues

clemos / try-haxe

A small webapp that allows to test Haxe online

https://try.haxe.org

MIT License

126 stars 41 forks source link

Possible git usage #24

Open Dr-Emann opened 12 years ago

Dr-Emann commented 12 years ago

I was wondering how large the try-haxe folder is getting, with auto-branching. If space becomes a problem, I thought of a solution using git as as a repository to store all of the saved examples, and checking them out to a set number of folders.

Pros

Provides a SHA-1 hash which could be used to identify the example (the first 6 digits should be plenty to remain unique)
Compression based on the difference to the parent example
Reduce redundancy
Cons
Requires the server to have git
More complex than current strategy
Might be pre-mature optimization, if space is not an issue, pointless

If you think this would be a worthwhile idea, I can start work on forking and working on an implementation, but I'd rather not start work on something that's not worthwhile/won't ever be used.

clemos commented 12 years ago

Hi Dr Emann,

I've actually been thinking about it too. It could also be fun to allow people to clone their project for further experiments, like gists for instance. This could really be an awesome project. My server has git installed so it should be fine.

Now personnally, while it's a probably fun and interesting idea to work on, it's just not something I felt was really urgent so I just "forgot" it somehow. Actually, space is not a real problem. The current archive is 1,5G on try.haxe.org, which is not really big (I still have 43G left).

Just feel free to do it, I'll totally merge what you'll come up with.

Dr-Emann commented 12 years ago

Heh, yea, no big rush for it, then. Still would be interesting. You could even write a script to make tags for all the existing ones, and push them into it as well, keeping the same hash.

I thought of two possible implementations.

Separate repositories, pushing to a central, master repo

This will work because git hard-links objects when cloning locally, and you can run git relink periodically to keep new objects linked as well.

Linked git dir

Using git init --separate-git-dir '../master.git' would allow all of the folders to share the same git directory. This will ensure maximum space savings. However, this means all folders would share the same HEAD, and we would therefore have to use the git plumbing commands, combined with a separate git index file (setting GIT_INDEX_FILE environment variable to a file local to .local_index) to stage and commit changes manually. i.e.

EXPORT GIT_INDEX_FILE=.local_index
git read-tree (hash)
git checkout-index -af

# Do work
git add .
git write-tree
git commit-tree (hash from previous command) -p (parent hash) -m "nothing"

clemos commented 12 years ago

Well this is quite beyond my git knowledge actually...

The second solution seems more natural, though, at least the part that says "all the folder share the same git directory". It seems good, but then the operations required to update it are obscure to me. Wouldn't it be possible to achieve similar behaviour the other way round, with one bare repository and checkouts to external dirs. Each directory name could correspond to a commit hash, which would make it easier to manage commits. I don't know. Because I still don't see how you can manage several people committing at the same time to several branches...

ITOH, since space is not really an issue, maybe we should focus on the actual features git could provide besides saving space.

As already said, I've been thinking about allowing one to clone his "project" like a gist. The best in this case would be for the user to only get the branch / history that directly leads to his version. I don't know if this implies generating a separate "clean" repo, or if branches are enough to achieve this.

In the same spirit, being able to navigate through versions directly on the website would be fun.

I'm quite lost, actually :p

Dr-Emann commented 12 years ago

Yea, I was up late last night reading up on the internals of git.

I think the first option would be easier, and more safe. It sounds almost exactly like what you suggested, actually, to have one bare repository, and then each directory would pull from the central repository, then push new commits in. Each new commit would require a new branch (so that they don't get garbage collected). The way I see it, it would work like this:

User navigates to try.haxe.org
User hacks on code, saves an example
- git makes a new commit, based off a pre-made initial commit that contains the default try-haxe code (this will be the root of all commits)
- git makes a branch called try-haxe_123abc (the first couple digits of the SHA1 hash)
User is presented with a link to try.haxe.org/#123abc

Then later:

User2 goes to try.haxe.org/#123abc
- We go into the oldest directory (one with oldest last-modified)
- We fetch from the central repository all new commits
- We checkout branch try-haxe_123abc
User2 hacks new code on top of the example
User2 chooses to save an example
- We commit (get new SHA1 of 234bcd)
- We make a branch (try-haxe_234bcd)
- We push to the main repo
User2 gets a link to try.haxe.org/#234bcd

Because each commit is based off of the example that the user started hacking from, the history would automatically include the chain of examples that lead to it. A "clone" would be implied by starting at an existing example, and saving something new.

clemos commented 12 years ago

Sounds good, except maybe the part with "oldest last-modified", which I don't really get. This said, I have no real idea as to the amount of work to implement this.

clemos / try-haxe

Possible git usage #24

Pros

Cons

Separate repositories, pushing to a central, master repo

Linked git dir