buildstream-migration / bst-staging

GNU Lesser General Public License v2.1
0 stars 0 forks source link

Give better warnings on overlaps #114

Closed Cynical-Optimist closed 4 years ago

Cynical-Optimist commented 4 years ago

See original issue on GitLab In GitLab by [Gitlab user @samthursfield] on Oct 12, 2017, 15:09

If two or more artifacts contain files in the same path, only one of them can be installed in the sandbox.

BuildStream will warn when staging overlapping dependencies in the sandbox with a warning like this:

Staged files overwrite existing files in staging area:

From gnu-toolchain/gawk.bst:
  /usr/bin/awk

This tells you where the new version /usr/bin/awk comes from, but it doesn't tell you which old versions are being overwritten.

We should produce output like this instead:

Staged files overwrite existing files in staging area:

From gnu-toolchain/gawk.bst:
  /usr/bin/awk (overwriting version from gnu-toolchain/stage2-gawk.bst)

There may be more than 2 conflicting versions of the file, so we should mention that too:

Staged files overwrite existing files in staging area:

From gnu-toolchain/gawk.bst:
  /usr/bin/awk (overwriting version from gnu-toolchain/stage2-gawk.bst and gnu-toolchain/stage1-gawk.bst)
Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @samthursfield] on Oct 12, 2017, 15:13

changed title from Give better {-error-}s on overlaps to Give better {+warning+}s on overlaps

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @samthursfield] on Oct 12, 2017, 15:13

changed the description

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Oct 18, 2017, 10:43

We agreed this was a blocker in our meeting; I feel like it might not be as it doesnt break API, but let's mark this as blocker for now.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Oct 18, 2017, 10:58

Misunderstood this, sorry for noise, not a blocker.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 14, 2017, 13:16

assigned to [Gitlab user @jonathanmaw]

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 14, 2017, 13:51

I'm picking this up, and I have a few questions about how people would like to see the result.

1: In the case of an element overlapping multiple elements with the same file, should every element that overlaps report which element it overlaps?

e.g.

element foo has files foofoo and foobar

element bar has files foobar and foobaz

element baz has files foobar and foobaz

elements are installed in order foo, bar, baz.

Would we report that:

baz has overlaps:

bar has overlaps:

or would the overlaps in bar be irrelevant because they've already been reported as part of baz?

2: Would it be a better idea to report overlaps per-file and indicate how they overlap, e.g.

overlaps detected:

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @aevri] on Nov 14, 2017, 18:38

For myself I think I’d first like to know overlaps on the element level and then be able to drill down to the files, e.g.

3: Which elements overlap sets exist, and how they overlap

Elements foo, bar, and baz overlap on files:

Elements bar and baz overlap on files:

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 15, 2017, 10:27

[Gitlab user @aevri] That seems sound, do you think there's value in knowing which element ends up on top? I can see the argument that if overlaps are happening they ought to be directly addressed, but on the other hand I think it would be helpful to keep the user informed of how these elements end up being layered.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @aevri] on Nov 15, 2017, 13:33

Ah yes we should definitely indicate the resolution, maybe something like this?

Elements foo, bar, and baz overlap on files:
- foobar
Files were taken from baz as it's the last element.

Elements bar and baz overlap on files:
- foobaz
Files were taken from baz as it's the last element.
Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 15, 2017, 17:51

Seems to be a lot of chatter here, so I'll just jot down some of my thoughts...

Originally I had thought something along the lines of issuing warnings as they occur, artifact by artifact; which seems easiest programatically; i.e.:

Overlaps encountered while staging ${stage_artifact}:
  /path/to/overlapping/file (${closest_ancestor_with_same_file})
  ...

In this way, when they overlap more than once, they will appear in the next warning, next time showing the last ${stage_artifact} as ${closest_ancestor_with_same_file}

Might look like:

Overlaps while staging gettext.bst:
  /usr/bin/msgfmt (base-system.bst)
  ...

Overlaps while staging custom-gettext.bst: 
  /usr/bin/msgfmt (gettext.bst)
  ...

Just throwing out what I thought would be the obvious approach because of how we stage artifacts in a consistent order, would just require a lookup of "closest ancestor which contains this file".

To improve on the above and what might be more compressed (avoids redundant data in considerably large log files), would be to collect the overlaps in ordered lists as they occur, and then print the files with more than one origin artifact with the ordered list or origins, like so:

Overlaps while staging:
  /usr/bin/msgfmt (custom-gettext.bst -> gettext.bst -> base-system.bst)
  ...
Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 16, 2017, 08:07

Sorry my last post from last night might not have been that constructive; maybe I was being impatient, anyway that is how I originally imagined it might look.

[Gitlab user @aevri] It looks like you have some ideas here, and I do agree with your point that "I think I’d first like to know overlaps on the element level and then be able to drill down to the files"

However, I think I need more concrete and specific ideas listed here of how it's actually going to be formatted, for instance the message "Elements foo, bar, and baz overlap on files:" does not tell me with absolute clarity, in which order the files overlap, and which of the elements "won" for the list of files those elements overlap on.

Further; if 3 elements overlap on the same set of files, we need a grouped statement about that; and then we need another grouped statement about the rest of the files which only 2 out of the same 3 elements also overlap, in which case it seems to me we start to lose clarity even further.

To me, making it absolutely clear what happened for each file (which file was staged in the end ? if I dont stage the winning element, which element would be next in line to win for this file ?) is of paramount importance here.

Can you make a suggestion which satisfies your desire to know about overlaps at the element level before the file level, which still preserves absolute certainty of which file wins and in which order ?

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @aevri] on Nov 16, 2017, 09:14

No worries, I was actually hoping to speed things along with my suggestion. If you've already got strong ideas then I'm happy to contribute changes later if they seem like improvements. Cheers!

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 27, 2017, 14:15

created branch 114-give-better-warnings-on-overlaps

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 27, 2017, 14:15

mentioned in merge request !165

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 29, 2017, 14:18

[Gitlab user @tristanvb] [Gitlab user @aevri], can you check whether merge request !165 resolves this issue to your satisfaction?

I believe there was discussion on altering overlaps handling so that it would:

Since that's not attached to this individual issue, I'll create a new issue to discuss and implement this once this issue has been resolved.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 29, 2017, 15:54

Is it possible we can reduce the (computational) cost of this feature by warning at artifact creation time instead of at staging time ?

Would that make sense ?

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 29, 2017, 16:24

If I understand correctly, at artifact creation time, you know which elements were staged to build it, so you can see that "the output of this build overlaps with something already staged".

When staging elements to build another element, an element will be combined with elements other than the ones that were used to build it, so we'd need to check for overlaps somewhere around here anyway.

As a hypothetical example:

core will depend on base, one way or the other, so checking for overlaps at artifact assembly will detect this overlap. In the case of client-system.bst, there's no equivalent "the output of this build" to check for overlaps in, so overlaps would only be detected while staging.

Describing the differences between these two approaches does strongly lead me to believe I'm missing something, though.

I can think of some optimisations, however:

  1. We could only report overlaps when the results of copying files indicates that overlaps happened, meaning we don't use the expensive overlap checking logic for elements that don't overlap.

  2. We could only calculate overlaps when assembling the final artifact, meaning we get one extensive report, rather than a little report with every element.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 29, 2017, 16:52

If I understand correctly, at artifact creation time, you know which elements were staged to build it, so you can see that "the output of this build overlaps with something already staged"

It would rather mean, after a build we inspect the output directory from which we need to make an artifact, and list it's files; then we need to compare that with the manifests for all of the artifact's runtime dependencies.

I guess this is a deadend unfortunately, because the check would have to be done on runtime dependencies, and we dont guarantee that all runtime dependencies are built when the element itself is built (some runtime dependencies that are not build dependencies are services and/or artwork, themes and icons, fonts, stuff that an app needs to run but not necessarily to build).

Still its an interesting line of thinking, there exists a time when all runtime dependencies of an element exist; and if we warn at that time, it is much less often than at staging time.

Of course we cant predict what overlaps might occur in a later reverse dependency, but assuming that reverse dependency needs to be built to be meaningful at all, it seems unimportant that the warning occur at staging time instead of at artifact production time.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 30, 2017, 14:42

Okay, I'm having trouble seeing a clear way forward at this point, so I'm going to compare the approaches and try to weigh them up.

1: Check for overlaps when staging files. This will only catch an element's overlaps when it's used in staging. It will catch overlaps with elements it doesn't depend upon, but is staged together with (i.e. has a reverse runtime dependency in common with) when staged. To reduce the amount of work we do, we can make it only generate a full overlap report if any overlaps have been detected while linking files.

2: Check for overlaps with runtime dependencies just before creating the artifact. Since we can't guarantee that runtime dependencies are created yet, we would have to queue a task to check overlaps once they're built (Is there a guarantee that they'll ever be built?). To me, that's scary pipeline work.

3: Check for overlaps only in the final artifact This saves the most time when building, but if you're seeking to eliminate overlaps, it's a lot more time-consuming, since you won't know where any overlaps are until the end of the build.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @aevri] on Nov 30, 2017, 16:18

[Gitlab user @tristanvb] [Gitlab user @aevri], can you check whether merge request !165 resolves this issue to your satisfaction?

Please count me out of this one now, thanks for circling back!

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Nov 30, 2017, 18:00

[Gitlab user @tristanvb] as you requested IRL, here's what the new overlap messages looks like:

jonathanmaw[[Gitlab user @fafnir]](https://gitlab.com/fafnir):~/workspace/buildstream/buildstream/tests/frontend/overlaps$ bst build collect.bst 
Loading:   004
Resolving: 004/004
Checking:  004/004

BuildStream Version 0.1.dev1566+g9f2eea4.d20171130
  Session Start: Thursday, 30-11-2017 at 17:57:16
  Project:       pony (/home/jonathanmaw/workspace/buildstream/buildstream/tests/frontend/overlaps)
  Targets:       collect.bst

User Configuration
  Configuration File:      /home/jonathanmaw/.config/buildstream.conf
  Log Files:               /media/jonathanmaw/external/buildstream-cache-3/logs
  Source Mirrors:          /media/jonathanmaw/external/buildstream-cache-3/sources
  Build Area:              /media/jonathanmaw/external/buildstream-cache-3/build
  Artifact Cache:          /media/jonathanmaw/external/buildstream-cache-3/artifacts
  Maximum Fetch Tasks:     10
  Maximum Build Tasks:     4
  Maximum Push Tasks:      4
  Maximum Network Retries: 2

Pipeline
      cached 0c94aa89 one.bst 
      cached c592eae4 three.bst 
      cached 7c7b0e91 two.bst 
   buildable 61327431 collect.bst 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[--:--:--] START   Starting build
[--:--:--][61327431][build:collect.bst] START   pony/collect/61327431-build.19286.log
[--:--:--][61327431][build:collect.bst] START   Staging dependencies
[--:--:--][61327431][build:collect.bst] WARNING Overlaps detected!

    Staged files overwrite existing files in staging area:
      /file1: one.bst -> three.bst
      /file2: one.bst -> three.bst -> two.bst
      /file3: three.bst -> two.bst

[00:00:00][61327431][build:collect.bst] SUCCESS Staging dependencies
[--:--:--][61327431][build:collect.bst] START   Integrating sandbox
[00:00:00][61327431][build:collect.bst] SUCCESS Integrating sandbox
[--:--:--][61327431][build:collect.bst] START   Caching Artifact
[00:00:00][61327431][build:collect.bst] SUCCESS Caching Artifact
[00:00:00][61327431][build:collect.bst] SUCCESS pony/collect/61327431-build.19286.log
[00:00:00] SUCCESS Build Complete

Pipeline Summary
  Total:       4
  Session:     1
  Fetch Queue: processed 0, skipped 1, failed 0 
  Build Queue: processed 1, skipped 0, failed 0 
Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 30, 2017, 18:27

    Staged files overwrite existing files in staging area:
      /file1: one.bst -> three.bst
      /file2: one.bst -> three.bst -> two.bst
      /file3: three.bst -> two.bst

Reading the above as a human, I expect this means that for /file1, the version in one.bst wins, because it overwrites the one from three.bst, correct ?

And for /file2, again one.bst wins, overwriting three.bst which overwrites two.bst, where two.bst is obviously the first element to have ever staged /file2 in the staging order ?

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jonathanmaw] on Dec 1, 2017, 10:22

I was actually interpreting it the other way, one at the bottom, three above one, and two above three.

So it's clear that's an ambiguous way of indicating staging order.

I think the tersest and most unambiguous replacement would be "above", so it would be "two.bst above three.bst above one.bst"

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Dec 3, 2017, 14:03

closed via merge request !165