Allow `Fetch` to match a subset of qualifiers on a `Push`ed asset?

tomcoldrick-ct commented 4 years ago

Currently we demand that a Fetch request match all the qualifiers for a previously Pushed blob to be found, however this doesn't line up with how BuildStream plans on using the Remote Asset API. In https://gitlab.com/BuildStream/buildstream/-/issues/1274 it's suggested that BuildStream will use a different set of Qualifiers for Push and Fetch requests - fetch qualifiers being a subset of those pushed. As a result this will mean that Push'd sources will never be Fetch'd by BuildStream. Note that BuildStream doing this is explicitly allowed in the spec, but our current implementation is also fully compliant with the spec...

Our implementation currently uses a list of all qualifiers as part of the inputs to a hash, which is in turn used as the key for a BlobAccess. As a result, matching on just a subset of qualifiers is a non-trivial change.

This may prove to be impossible to implement with #3, but it's worth having this on the radar.

arlyon commented 4 years ago

I think this is absolutely a necessary change, but a difficult one to reason with. Two examples off the top of my head where this may be useful:

conceivably the vcs.commit and vcs.branch qualifiers are not entirely mutually exclusive in some cases. Say for some reason I specify I need a specific branch and a commit on that branch. At this point a set of qualifiers that only reference the commit is still valid, this one that references only the branch is not. I suppose the solution is to make specifying both illegal.
imagine qualifiers for auth. you have a tarball that is behind some basic auth and so when initially acquiring it the auth is required but afterwards it doesn't really make sense.

Maybe contrived examples but I think it illustrates a few cases where this may be useful

tomcoldrick-ct commented 4 years ago

For the VCS qualifier stuff, I personally would expect usage of the two to mean "I would like this commit, and I would like you to make sure it's on this branch", which I think I'd probably still care about if it'd already been Pushd.

The auth example is interesting, but I'm not convinced that this is a good use of qualifiers actually - let's say we have a tarball behind authentication, then in order to perform a remote Fetch, the client sends creds as qualifier(s). Then the server can download the source and proceed. Subsequent Fetches will all need to supply the creds in order to hit the cache, as Fetches must match all qualifiers unless the asset was Pushd. Therefore if a client wants to be able to omit authentication qualifiers for subsequent Fetch requests, it has to manually Push the thing anyway, at which point it may as well just not add the qualifiers to this Push, rather than us adding logic to match on a subset.

That said, I can see the utility in using the qualifiers of a Pushd asset to provide some additional metadata, so I'm not actually against this feature.

buildbarn / bb-remote-asset

Allow `Fetch` to match a subset of qualifiers on a `Push`ed asset? #8