"git-fat filter-smudge" should support pulling missing fat files.

OptiverTimAll commented 9 years ago

I've been investigating git large-file storage solutions, and while git-fat is more generally useful, GitHub's git-lfs is a bit nicer to use. One reason is that you don't have to manually "git lfs pull" when you "git pull" or "git checkout", because git-lfs' smudge filter will automatically try to download files it doesn't have cached locally. I'm not sure whether this is something that git-fat should do by default; on the one hand, there's no guarantees that the user will have the required network connectivity when "git checkout" runs; on the other hand, maybe trying and failing and printing an error message would be better than git-fat's current behaviour (silently checking out un-smudged stubs). I would like to have one of the following behaviours:

"git-fat filter-smudge" automatically tries to download missing fat files, and prints a warning for each file it cannot obtain (because it's missing, or because it can't connect to the network or whatever)
"git-fat filter-smudge" defaults to its current behaviour, but supports the above behaviour with an extra command-line option, say --auto-fetch. "git-fat init" should also have an extra command-line option that installs the smudge filter with the --auto-fetch option.

abraithwaite commented 9 years ago

This one I'm unsure about. I'll leave this open for discussion, but I think I'd prefer it not be the case that it lazy loads the files during a smudge. I know that's what git-lfs does, but I believe that it leads to very messy edge case handling.

For example, if a filter fails and returns non-zero than I believe the checkout is aborted. Otherwise, if some files successfully download and some do not, then what do you do? You've got some which have been replaced and some which haven't. Sure you can print an error, but you can't return an error without creating a (nearly) unrecoverable mess.

Basically, I'd prefer if it were all or nothing and putting that behavior in filters prevents that behavior. (It should be noted that the current implementation is not transactional but does not prevent it from being so in a hook).

OptiverTimAll commented 9 years ago

I think git-fat already has to deal with some fat files being available and some missing. For example, if I have a directory full of fat files, and I "git checkout" a commit that adds a new such file, but I haven't run "git fat pull" yet, I'll wind up in that inconsistent state.

Just off the top of my head, would it be practical to do something like:

If the smudge-filter can't find a local copy of the file, instead of writing out the raw stub, write out a deliberately corrupt stub like "WARNING: FAT FILE NOT AVAILABLE", so "git status" and "git diff" will give a clue that something's gone wrong
If the clean-filter is fed a corrupt stub, it should bail out so that people can't just run "git add -A ." to ignore the smudge warnings.

I notice there's a filter.<driver>.required git config option that tells git that when smudge fails, the result is a broken working copy. That seems to describe 'git fat' pretty well, but I notice "git fat init" doesn't set that option - would it change the edge-cases you describe?

abraithwaite commented 9 years ago

git-fat already has to deal with some fat files being available and some missing

This is the case yes. When doing a clone for the first time we don't have any fat files on the local filesystem yet so we write out the placeholders. Placeholders in the working copy without a corresponding file in the fat store are what "Orphans" are.

I have a directory full of fat files, and I "git checkout" a commit that adds a new such file, but I haven't run "git fat pull" yet, I'll wind up in that inconsistent state.

This is exactly correct. We handled this by having a post-checkout hook call git-fat pull (this is the same case as cloning as it turns out).

I notice there's a filter..required git config option ...

I didn't know about this. It seems like it could be useful, but I'm not totally sure.

To give a bit of context for why the current behavior is the way it is: We wanted to be able to selectively pull files in the repository without pulling all the large files for a particular commit. This was very important for various reasons I won't get into at the moment, but is what leads to the duality of having orphan placeholders and real files in the working tree.

While moving push and pull to filters might be a good feature to have added, I don't think it should be the default for git-fat. In any case it's going to be challenging technically. However, I wouldn't outright reject it if it came along.

Edit: sorry I didn't get into all this in the first response, but your suggestions exercised my memory for the reasons why it is the way it is. :-)

OptiverTimAll commented 9 years ago

Aha, I checked .git/hooks/ and didn't see an example post-checkout hook, so I didn't realise there was one. I see it's actually listed on githooks(5). Depending how git handles a hook that fails (because it can't pull required fat files) this ticket may effectively be a dupe of #7.

cynix commented 9 years ago

post-checkout runs after the checkout is complete and cannot affect the result of the checkout. You'll end up with orphan subs if this hook fails, but that's no worse than the smudge filter failing.

abraithwaite commented 9 years ago

post-checkout runs after the checkout is complete and cannot affect the result of the checkout.

What the post checkout hook did was run a git-fat pull followed by a git-fat checkout. The git-fat checkout runs git checkout-index of the affected files. This forces a re-smudge of the files that it pulled.

If the hook failed it wouldn't normally affect the result of the checkout, but since we called git-fat checkout which in turn called git checkout-index, it did have a chance to leave the repository in an inconsistent state (e.g. a file partially written or w/e). The point was however that we separated the downloading of the files from the smudging of the files.

ciena-blueplanet / git-fat

"git-fat filter-smudge" should support pulling missing fat files. #56