WRT to refactoring RIA/ORA related code, I've been thinking about the layout versions as specified in the ria-layout-version files in-store. There's a couple of aspects to that:
It doesn't need to be a version exactly. We can have arbitrary labels that don't need to convey the notion of "newer"/"older". Different layouts can make sense for a lot of reasons.
There should be central place explicitly defining those versions (and may be additional labels that would translate to the existing versions while future additions would be only a label, not a number). And by defining I mean a translation of those labels into their meaning (how to resolve actual location) rather than just a list of valid versions or something. The thought behind this is to eventually may be move those definitions or make them importable w/o requiring datalad. There's a repo intended to provide docs and tools for store maintainers (https://github.com/datalad/ria-tools). I think maintenance scripts etc. should not draw in datalad as a dependency and particularly not only because of getting the definitions required to resolve layout labels. Duplicating those definitions doesn't seem to be a good idea either. Not clear to me how to go about it best, but one option would be to have them in that repo instead and make datalad depend on it rather than the other way around
Ultimately, layouts only determine a few locations: A basepath for a dataset within store (top-level layout version, currently there's only 1, transating to ds.id[:3] / ds.id[3:]. Then: "Where's the annex object tree for such a dataset and what's its layout?"(dirhashmixed vs. dirhashlower). And finally: "Where's the location of a possible archive?".
This could be very flexible, if we allow the store to define that itself rather than relying on what is implemented in datalad and referred to by a version/label. Any store maintainer could tailor it to their needs, if ria-layout-version was allowed to alternatively have a JSON entry or something similar defining those locations.
I think a RF'ing should be made with those aspects in mind, allowing for this kind of features down the road.
WRT to refactoring RIA/ORA related code, I've been thinking about the layout versions as specified in the
ria-layout-version
files in-store. There's a couple of aspects to that:It doesn't need to be a version exactly. We can have arbitrary labels that don't need to convey the notion of "newer"/"older". Different layouts can make sense for a lot of reasons.
There should be central place explicitly defining those versions (and may be additional labels that would translate to the existing versions while future additions would be only a label, not a number). And by defining I mean a translation of those labels into their meaning (how to resolve actual location) rather than just a list of valid versions or something. The thought behind this is to eventually may be move those definitions or make them importable w/o requiring datalad. There's a repo intended to provide docs and tools for store maintainers (https://github.com/datalad/ria-tools). I think maintenance scripts etc. should not draw in datalad as a dependency and particularly not only because of getting the definitions required to resolve layout labels. Duplicating those definitions doesn't seem to be a good idea either. Not clear to me how to go about it best, but one option would be to have them in that repo instead and make datalad depend on it rather than the other way around
Ultimately, layouts only determine a few locations: A basepath for a dataset within store (top-level layout version, currently there's only
1
, transating tods.id[:3] / ds.id[3:]
. Then: "Where's the annex object tree for such a dataset and what's its layout?"(dirhashmixed vs. dirhashlower). And finally: "Where's the location of a possible archive?". This could be very flexible, if we allow the store to define that itself rather than relying on what is implemented in datalad and referred to by a version/label. Any store maintainer could tailor it to their needs, ifria-layout-version
was allowed to alternatively have a JSON entry or something similar defining those locations.I think a RF'ing should be made with those aspects in mind, allowing for this kind of features down the road.