Open mih opened 2 years ago
But this could/would (always @bpoldrack ?) corrupt the file availability records in the annex branch, because git-annex would treat the RIA git-repo location as a part of the git-annex clone network.
Yes, that's because we have the annex object tree in the same location that git-annex would expect it to be in a regular, bare git-annex repository (annex/objects
). However, this object tree is unknown to the bare, plain-git repository and accessed via special remote instead. Obviously this is not an issue when there's no annexed content in the store (create-sibling-ria
can create w/o any special remote and object tree). So, no, not always, but almost always.
The trouble kicks in when running a git-annex command makes annex discover a trace of a seemingly broken annex repo. It will run git-annex-init on what is supposed to be a plain git repo, indexing the object tree and hence recording availability for its uuid, when in fact the location is identical with the special remote (but annex can't possibly realize this).
It is doable to let annex deal with this correctly via sameas
, but this isn't universal either. One of the reasons the object tree is decoupled from the bare repo is, that we allow to use dirhashmixed OR dirhashlower for its layout. Mixed is necessary for ephemeral clones symlinking into the store (which was a much desired feature leading to the idea of RIA stores in the first place) and it's datalad's default. But annex would expect a bare repository to always use dirhashlower instead. One can have that (layout version 1 for the datasets), but it screws with the ephemeral clones (we should probably have a safeguard in the respective clone
routine to not symlink into dirhashlower).
So, to avoid a git annex fsck
accidentally messing up the availability info by running it in-store, I'd suggest to either:
--reckless ephemeral
and run fsck
locally. This would only be useful for discovery of an integrity issue. Not pushing back the git-annex branch. But: It's obviously faster, as it avoids copying the content first.git annex fsck -f <special remote>
and push back.it is unclear to me, why the git-repo config is not set to annex.ignore in the RIA store (or is it?).
I forgot. Will double-check.
It seems intuitive to run
git annex fsck
on the bare repo inside a RIA store. But this could/would (always @bpoldrack ?) corrupt the file availability records in the annex branch, because git-annex would treat the RIA git-repo location as a part of the git-annex clone network. However, the associated storage remote already points to this location.it is unclear to me, why the git-repo config is not set to
annex.ignore
in the RIA store (or is it?).