Open BuildStream-Migration-Bot opened 3 years ago
In GitLab by [Gitlab user @sstriker] on Mar 25, 2020, 22:08
Given the Source Plugin API changes suggesting this for 2.0
In GitLab by [Gitlab user @cs-shadow] on Mar 26, 2020, 12:18
[Gitlab user @sstriker] thanks for the write-up.
How do you envision this working for SourceTransforms (like pip source etc) that require access to other sources listed for that element? Or would such sources have to resort back to how they are handled now?
In GitLab by [Gitlab user @sstriker] on Mar 26, 2020, 21:28
Excellent question.
It really depends on how specialized of a source plugin we are willing to make. And how much there is to gain when tracking these types of sources - in the pip source case, I imagine there might be quite a bit of work that can be avoided. That is, going from a requirements.txt to a frozen set of requirements.
I imagine that during tracking source plugins that have indicated they need the previous sources, to be passed the Tree (vDirectory?) of the previous sources at the start of tracking. The pip source plugin could use the digest of the requirements.txt file as a qualifier, and then maybe use these for tracking:
The ref
for a pip source plugin is the frozen requirements, I think that would result in push qualifiers would be:
In a setup like this, you would have a central BuildStream instance be taking care of the tracking:
And each other client would be tracking to the same set of frozen deps the central instance has. For the pip source plugin this has two implications:
In short, I think that there might be source plugins that require previous sources, where it makes sense to resort to native tracking. In other cases, the specialized handling may be worth it.
Make sense?
In GitLab by [Gitlab user @sstriker] on Mar 27, 2020, 20:51
[Gitlab user @cs-shadow]: After a night's sleep, an updated answer
A not unimportant benefit I left out for this pip source example:
Now, if we wanted to make the example of pip source more generic, we could document qualifiers in the Remote Asset API spec.
Let's assume that for the Remote Asset API in combination with PyPI, we use the following convention:
On a bst fetch operation we already have a "ref", which we map to the pypa.pip.requirements.frozen qualifier. We call FetchDirectory with only that qualifier.
On a bst track operation we want the following to happen.
As you can see this reveals we do need a protocol for the Source Plugin API in combination with Remote Asset tracking and previous sources. Step 1 would not be relevant to plugins that don't have previous sources.
On a bst push operation we send all of the qualifiers:
In GitLab by [Gitlab user @cs-shadow] on Apr 1, 2020, 22:39
Many thanks for the detailed response.
The overall plan seems good to me. On the high-level design, I only have one comment/quesiton.
I'm unsure about adding the input requirements as a qualifier. When tracking the same exact same requirements at different times, we are not guaranteed the same result. This will depend on how often the dependencies change, but will see heavy churn with lots of unbounded dependencies.
This is because pip will pick the latest version each time. So, if there is a new release of any dependency between two track operations, the output will be different.
Imagine this scenario:
Ponies >= 1.0
as their only requirement in element Unicorn.Now, if we use pypa.pip.requirements
as a qualifier, we will get back the cached version (version 1.0) in form of pypa.pip.requirements.frozen
. However, if the user does native tracking at this point, they will get version 2.0. As such, it is not good for reproducibility.
This is not a general problem with the plan itself, but specifically with pip source. However I think it may extend to other popular package managers as well, since most of them pick the latest available version.
Having said that some other package managers (like the one in Go) aim to provide this guarantee by picking the oldest allowed version.
A couple of minor comments:
BuildStream clients wouldn't need to have python host tools to track (caveat: only when not falling back to native tracking)
This is pretty neat.
sorted list of packages with optional version information, eg. package1,package2,package3==0.5.4
I don't think it matters here but I'd just mention it that we will likely have duplicates in this list. When merging differnet requirement files and inline requirements, BuildStream relies on pip's logic to satisfy all the constraints. So, a single packge may appear multiple times, like (package1,package1>0.1 etc).
pypa.pip.contraints: sorted list of packages with version information
Maybe I'm missing something but I'm not sure if I understand what are you referring to by "constraints" here.
The way I understand it, pypa.pip.requirements
is the set of input requirements and pypa.pip.requirements.frozen
is the result of tracking on the input requirements. What are these constraints then?
Are they just additional input, or something else?
See original issue on GitLab In GitLab by [Gitlab user @sstriker] on Mar 25, 2020, 22:08
We can reduce the load and reliance on additional services by leveraging the Remote Asset API. For example, instead of having all clients poll git services, the FetchService.FetchDirectory API is used to resolve the commit at a certain branch. As clients all track to the same revision, in cache hits are more likely for sources, artifacts and actions.
To support this we need to extend the Source Plugin API to return the list of URIs and qualifiers as needed by the FetchService. Specifically:
In the response the client will learn the Digest of the source as well as all other qualifiers the service knows about. This would include identifying information the source plugin would use in its
ref
. For example:Behavior should be configurable to support the following use cases:
See also:https://mail.gnome.org/archives/buildstream-list/2020-February/msg00000.html