OCamlPro / ows

A service to analyse the state of the opam repository w.r.t. all available version of the OCaml compiler.
http://ows.irill.org/latest/today/index.html
Other
5 stars 3 forks source link

package should be unavailable if its dependencies are unavailable #14

Open agarwal opened 9 years ago

agarwal commented 9 years ago

I find that core 111.28.00 is marked broken on OCaml 4.00.1. The explanation given is that the dependency core_kernel = 111.28.00 cannot be satisfied. And we see that core_kernel 111.28.00 is unavailable on 4.00.1. In this case, I would consider core 111.28.00 to be unavailable, not broken. I think the rule should be: A package is unavailable (not broken) if the sole reason it cannot be installed is because its dependencies are unavailable.

abate commented 9 years ago

If a package A depends on a package that is no available, then A cannot be installed. From the end user prospective it really does not matter why a package cannot be installed on his machine. Why A should depend on a package that does not exists in the repository ?

agarwal commented 9 years ago

From the end user prospective it really does not matter why a package cannot be installed on his machine.

Perhaps, but it does matter for package maintainers. I want to know if my package or my package description is really broken, or if it is just unavailable due to some dependency being unavailable.

Why A should depend on a package that does not exists in the repository ?

core_kernel does exist, but it is available only for ocaml >= 4.01.0. It is true that core's package description could fix this by also declaring that it is available only for ocaml >= 4.01.0. However, I feel the manual work:

abate commented 9 years ago

I see your point. But won't the problem better tackle upstream adding some form or packaging helper to infer these constraints automatically at packaging time ? This is easily feasible within opam I guess @AltGr ? A user will still get an error when trying to install a package that cannot be installed because of a broken dependency. The role of OWS is to avoid these problems to the end user while at the same time helping package maintainers with their efforts. Automatically adding more constraining dependencies is in my opinion a better way then just classifying differently a package that cannot be used. Said that, it should not be too difficult to categorize a package as "unavailable" or more generally, broken because one of its dependency is broken, so to give to the package maintainer a better overview of the root of the problem.

agarwal commented 9 years ago

Automatically adding more constraining dependencies is in my opinion a better way

If you mean literally add the constraints to the opam file, I disagree. A package's description should list only its direct dependencies. If A depends on B, and B depends on C, then A's description should not mention C, regardless of whether C is a package or a compiler version.

AltGr commented 9 years ago

This issue is being raised again and again, and I think both sides should sit down and think about the alternatives and their implications. There are actually strong points in favor of both, so it's really a matter of design, and it must be examined in terms of soundness, workflow and maintainance ; and from the points of view of the users, package maintainers and repository maintainers, to get the better ecosystem overall.

  1. OWS makes the assumption that a (version of) a package that can't be installed on a given OCaml version due to its dependencies is broken unless it is explicitely declared unavailable on that version.
  2. The unwritten policy on the repository, and understanding of the maintainers, is that a package doesn't need to be marked unavailable if it's only due to its dependencies.

End-user install case

Let's say we want to install foo, which depends on bar, which isn't available with the current compiler. On a end-user point of view:

  1. if all compiler constraints are explicit, foo won't even appear in the default package listing. This makes sense since it can't be installed. opam install foo fails directly, and prints the compiler constraint.
  2. if there is no constraint in the package's metadata, the package appears in opam list. opam install foo fails after calling the solver; this is well handled, though, and provides a message including the dependency chain and the final compiler constraint(s).

There is little difference between the two, only different opam list results and the printed dependency chain. The real differences appear when we consider changes in the repository ; and that's also an area where OWS can be most useful -- avoiding regressions is at least as useful as statically pointing issues.

Why add the constraints ?

The basic idea is that if a package you are responsible for isn't available for a given compiler -- and especially if it is no longer available on any compiler -- you will want to know ; so OWS should detect and mark that. Once it's acknowledged, marking it in the package's metadata is a way to register and document that acknowledgement: inferred constraints are used to detect unexpected changes, registered ones to recognise known restrictions.

The issue with the second approach is "the package is not available on some compiler version only due to its dependencies": that's very difficult to assert since we can't, in general, test that the package would compile with that compiler version, while its dependencies aren't available; so this amounts to making a pessimistic assumption -- that the package wouldn't work with that compiler version -- and consequently makes the repository more robust against changes.

Indeed, if, say, we later relax the compiler constraints on bar in our example above, without these constraints, foo would suddenly become installable on compiler versions it has never been tested on, and may fail to compile ; with this added constraints, you wouldn't be hit by this change, so the repository is more robust. Then the developer of foo may test with the newly available compilers and relax its constraints if appropriate.

One argument against the OWS approach is that adding the constraints would duplicate information that can be inferred, and lose information on constraints of the dependent packages. This is not completely true:

Adding these dependencies also allows for better documentation of constraints for static tools (e.g. opam2web at http://opam.ocaml.org/packages/) ; although such tools could infer it too, it would be more work.

Why not add them ?

The current handling of the repository is making the more optimistic assumption: we accept that only changing a constraint on a dependency of a package may cause a not compiling package (I don't use the term broken to avoid confusion. Here a user may try to install the package, and have it fail to compile, which is much worse; esp. since the error may not be explicit). However, the effects of this all depend on how changes are accepted in the repository: both approaches make untested assumptions, and when the change is proposed, additional tests can discriminate. For example, when bar constraints are relaxed, foo needs to be tested on the compilers it's newly available on, and the constraint may need to be added to foo.

Note that with the pessimistic approach, we don't know anymore if foo's constraints were inferred or defined by the developer, so it may be more difficult to automatically detect the new availablility; the optimistic approach would allow to detect it, and automatically run the appropriate tests -- probably using distcheck/OWS and some diff. As such, "optimistic" also means we put more trust into the developers, and the constraint they manually put in their packages. With an added static constraint, we would need the maintainer of the package to notice that bar's constraints have been relaxed, manually test and relax foo's constraints too if appropriate.

Also, the fact that a change in a dependency may break a package goes far beyond compiler constraints: the tests on the reverse dependency cone of a package is thus something that is generally useful, and as such the pessimistic approach doesn't prevent most cases of not compiling packages appearing.

With a mechanism to check the diff between two OWS states, and with git, registering the known availability state of a package as a static package constraint may be less useful.

Workflows

One thing that strikes me is that the packages in red on OWS do need some fixing, but it isn't clear in general that it's the package in red itself that causes the problem. May happen that it should just switch to a more recent version of a dependency, but the dependencies themselves are as likely the ones to fix. Thus, the summary page is maybe the most useful of OWS, giving the most common root causes of package unavailabilities.

Paradoxally, if we hardcode version constraints, we lose the origins of these constraints, and can't get that information anymore.

Package maintainer

Depending on how we implement the tests, a package maintainer may be notified that:

  1. his package is newly available on some compiler version
  2. his package is no longer available on some compiler versions
  3. his package is no longer installable

With the static constraints option, 2 and 3 are easier (we already have the information in the current OWS); 1 may not be possible. New and existing packages will need additional constraints, opam-publish might help with those, but it's an additional burden.

Without static constraints, 1, 2 and 3 all need an additional tool to get distcheck/OWS diffs on a commit -- which would be extremely useful -- but there is no added issues or trouble.

Repository maintainer

Statically checking the repository state is very useful to improve and notice weaknesses. Most useful is to get a report of the consequences of merging a given PR, which is the decision repository maintainers have to make several times a day. The issue at hand doesn't seem to have lots on impact of this, except maybe that without static constraints more care should be taken, but this stresses the importance of a distcheck/OWS diff generating a report on a repository commit, and the interaction with runtime tests.

End-user

This basically changes a bit the balance between

  1. more packages available on a given compiler
  2. less risk of encountering a non-compiling package

2 depends very much on how and what we test before mering pull-requests.

Conclusion

That OWS is a very useful tool is not the discussion here: the question is mostly what policy we want to adopt on the repository -- whatever it is, it would be much more productive if the different tools, developers, and the repository maintainers all follow the same one. Information in OWS is actually independent of this: some wording choices hint to the static constraint scheme and no more.

Now, with the current OWS, whether or not the goal should be to have it all green should be determined from the benefit to all agents involved, and end-users.

This will be of interest to @abate, @rdicosmo, @avsm, @samoht, @lefessan ; please add any arguments I may have forgotten, and be constructive !

abate commented 9 years ago

Thank you for the detailed overview of the state of affairs. I'm working on a small tool to compare the state of the repository given two commits, or a commit and a patch file : I'm almost there. This should allow a package maintainer to get an early warning of which packages might be affected by a commit. We could then integrated this with travis or simply give this tool to maintainer to test their repos offline (more work to do this, as you need to entire ows toolchain installed).

avsm commented 9 years ago

Excellent summary @AltGr! I am generally in favour of minimising metadata duplication, and hence prefer the current method of not propagating version constraints (there have been a few cases where a dependency was temporarily broken on older OCaml's, and subsequently fixed, such as cmdliner).

I really like @abate's repository diff tool -- getting it working offline (e.g. via a Docker container) should make it fairly easy to port to Travis.

rdicosmo commented 9 years ago

TL;DR -- this will take time...

Thanks Louis for this detailed analysis. The short message I got, and fully share, is that the real issue is how to set up an efficient workflow that will improve the overall quality of the Opam repository over time.

This is a complex issue, and I do not expect it to be settled in a short amount of time.

Let me offer some quick thoughts to this already long thread.

Package repository QA, dependencywise

OWS precisely pinpoints packages whose dependencies are not satisfiable, that is, packages that, when the user try to installs them, will fail with a message related to broken dependencies.

This tool in intended for enabling a level 0 [1] quality assurance for a package repository: ensuring that no package leading to such messages should be present for more than a transient amount of time.

Actually, in Debian, the goal is to have NO such package in the stable release... but there are no releases in Opam repo, hence my weaker statement: indeed the Opam repo seems to lay more or less midway between a Debian unstable and a Debian testing, that do sport quite a bit of red cells, see https://qa.debian.org/dose/debcheck/unstable_main/index.html https://qa.debian.org/dose/debcheck/testing_main/index.html

Red is bad! I want my package back

(read while listening to https://www.youtube.com/watch?v=aGeFf_rIAVQ)

It is no surprise at all, for me, to see people unhappy when packages they maintain are pinpointed by a QA tool (any QA tool). It is indeed true that a package A may be fixable only by changing some other package B on which A's maintainer has no responsibility nor control, and A's maintainer consider it unfair to see his package A pinpointed instead of B. This may lead to endless debates: the "sould one add a compiler constraint on A when B is missing on that particular switch, or pester B's maintainer to make B available on that switch?" is just an example. I know the temptation to just tweak the termometer instead, by changing the color of the cell, is strong, but we should definitely resist it.

What can I do? ('Cause I'm feelin' blue... same music :-))

Since we are interested in making progress in the repository, probably we should

1) stop getting upset when seeing red cells (we are not bulls, right?): the red cell just means "A cannot be installed, for dependency reasons", which is a true, undebatable statement; it does not mean "A's maintainer is at fault", which, as we have seen, is a much more debatable issue.

And yes, it is *OK* to see A marked red while waiting for its dependency B
to come in, one should remove/mark as unavailable A *only* if B has no
chances to come in.

2) start focusing at the changes in the state of the repository, instead. In particular, great hopes can be held in the beneficial effects of the RSS feed that will come out as solution to the issues #2 and #3 : having package maintainers get a personalised feed with the changes in state of their packages may be a great tool to make quick progress

Roberto

[1] yes, it is really level 0, the basic, minimal requirement for a qualified repository; there are much more advanced properties one would like to check, but no point in raising them while we are still stuck at level 0

On Mon, May 11, 2015 at 08:39:40PM -0700, Louis Gesbert wrote:

This issue is being raised again and again, and I think both sides should sit down and think about the alternatives and their implications. There are actually strong points in favor of both, so it's really a matter of design, and it must be examined in terms of soundness, workflow and maintainance ; and from the points of view of the users, package maintainers and repository maintainers, to get the better ecosystem overall.

  1. OWS makes the assumption that a (version of) a package that can't be installed on a given OCaml version due to its dependencies is broken unless it is explicitely declared unavailable on that version.
  2. The unwritten policy on the repository, and understanding of the maintainers, is that a package doesn't need to be marked unavailable if it's only due to its dependencies.

End-user install case

Let's say we want to install foo, which depends on bar, which isn't available with the current compiler. On a end-user point of view:

  1. if all compiler constraints are explicit, foo won't even appear in the default package listing. This makes sense since it can't be installed. opam install foo fails directly, and prints the compiler constraint.
  2. if there is no constraint in the package's metadata, the package appears in opam list. opam install foo fails after calling the solver; this is well handled, though, and provides a message including the dependency chain and the final compiler constraint(s).

There is little difference between the two, only different opam list results and the printed dependency chain. The real differences appear when we consider changes in the repository ; and that's also an area where OWS can be most useful -- avoiding regressions is at least as useful as statically pointing issues.

Why add the constraints ?

The basic idea is that if a package you are responsible for isn't available for a given compiler -- and especially if it is no longer available on any compiler -- you will want to know ; so OWS should detect and mark that. Once it's acknowledged, marking it in the package's metadata is a way to register and document that acknowledgement: inferred constraints are used to detect unexpected changes, registered ones to recognise known restrictions.

The issue with the second approach is "the package is not available on some compiler version only due to its dependencies": that's very difficult to assert since we can't, in general, test that the package would compile with that compiler version, while its dependencies aren't available; so this amounts to making a pessimistic assumption -- that the package wouldn't work with that compiler version -- and consequently makes the repository more robust against changes.

Indeed, if, say, we later relax the compiler constraints on bar in our example above, without these constraints, foo would suddenly become installable on compiler versions it has never been tested on, and may fail to compile ; with this added constraints, you wouldn't be hit by this change, so the repository is more robust. Then the developer of foo may test with the newly available compilers and relax its constraints if appropriate.

One argument against the OWS approach is that adding the constraints would duplicate information that can be inferred, and lose information on constraints of the dependent packages. This is not completely true:

• the information is duplicated at a certain point in time, but may have a different span, as we have seen. As such, the added constraint documents what has been tested. • weight of metadata not considered, having no constraint or a duplicated constraint are actually two opposite, equally false assumptions on something we can not test. One, optimistic, the other, conservative, more robust.

Adding these dependencies also allows for better documentation of constraints for static tools (e.g. opam2web at http://opam.ocaml.org/packages/) ; although such tools could infer it too, it would be more work.

Why not add them ?

The current handling of the repository is making the more optimistic assumption: we accept that only changing a constraint on a dependency of a package may cause a not compiling package (I don't use the term broken to avoid confusion. Here a user may try to install the package, and have it fail to compile, which is much worse; esp. since the error may not be explicit). However, the effects of this all depend on how changes are accepted in the repository: both approaches make untested assumptions, and when the change is proposed, additional tests can discriminate. For example, when bar constraints are relaxed, foo needs to be tested on the compilers it's newly available on, and the constraint may need to be added to foo.

Note that with the pessimistic approach, we don't know anymore if foo's constraints were inferred or defined by the developer, so it may be more difficult to automatically detect the new availablility; the optimistic approach would allow to detect it, and automatically run the appropriate tests -- probably using distcheck/OWS and some diff. As such, "optimistic" also means we put more trust into the developers, and the constraint they manually put in their packages. With an added static constraint, we would need the maintainer of the package to notice that bar's constraints have been relaxed, manually test and relax foo's constraints too if appropriate.

Also, the fact that a change in a dependency may break a package goes far beyond compiler constraints: the tests on the reverse dependency cone of a package is thus something that is generally useful, and as such the pessimistic approach doesn't prevent most cases of not compiling packages appearing.

With a mechanism to check the diff between two OWS states, and with git, registering the known availability state of a package as a static package constraint may be less useful.

Workflows

One thing that strikes me is that the packages in red on OWS do need some fixing, but it isn't clear in general that it's the package in red itself that causes the problem. May happen that it should just switch to a more recent version of a dependency, but the dependencies themselves are as likely the ones to fix. Thus, the summary page is maybe the most useful of OWS, giving the most common root causes of package unavailabilities.

Paradoxally, if we hardcode version constraints, we lose the origins of these constraints, and can't get that information anymore.

Package maintainer

Depending on how we implement the tests, a package maintainer may be notified that:

  1. his package is newly available on some compiler version
  2. his package is no longer available on some compiler versions
  3. his package is no longer installable

With the static constraints option, 2 and 3 are easier (we already have the information in the current OWS); 1 may not be possible. New and existing packages will need additional constraints, opam-publish might help with those, but it's an additional burden.

Without static constraints, 1, 2 and 3 all need an additional tool to get distcheck/OWS diffs on a commit -- which would be extremely useful -- but there is no added issues or trouble.

Repository maintainer

Statically checking the repository state is very useful to improve and notice weaknesses. Most useful is to get a report of the consequences of merging a given PR, which is the decision repository maintainers have to make several times a day. The issue at hand doesn't seem to have lots on impact of this, except maybe that without static constraints more care should be taken, but this stresses the importance of a distcheck/OWS diff generating a report on a repository commit, and the interaction with runtime tests.

End-user

This basically changes a bit the balance between

  1. more packages available on a given compiler
  2. less risk of encountering a non-compiling package

2 depends very much on how and what we test before mering pull-requests.

Conclusion

That OWS is a very useful tool is not the discussion here: the question is mostly what policy we want to adopt on the repository -- whatever it is, it would be much more productive if the different tools, developers, and the repository maintainers all follow the same one. Information in OWS is actually independent of this: some wording choices hint to the static constraint scheme and no more.

Now, with the current OWS, whether or not the goal should be to have it all green should be determined from the benefit to all agents involved, and end-users.

This will be of interest to @abate, @rdicosmo, @avsm, @samoht, @lefessan ; please add any arguments I may have forgotten, and be constructive !

— Reply to this email directly or view it on GitHub.*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 3020 (3rd floor) Batiment Sophie Germain 8 place Aurélie Nemours

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

samoht commented 9 years ago

Thanks very much @AltGr for the summary. I think most of the disagreement lies in the fact that "unavailable" currently have different meanings (in opam):

  1. a package is unavailable if its available predicates evaluates to false
  2. a package is unavailable if one of its dependency is unavailable
  3. a package is unavailable if all of its dependencies are available independently but are in conflicts if installed at the same time.

People in favor of "less metadata" (this includes me) don't understand why ows reports 1. and 2. differently but are still (very) interested by 3.

[EDIT by @AltGr: fixed numbering, md renumbers lists starting from 1]

AltGr commented 9 years ago

I think most of the issue has been solved, but what remains actually boils down to this: We all agree that it's bad to have packages that can't be installed. But that can't be avoided for some packages and some configurations. For those packages, the Debian policy is that they should be explicitely restricted to where they are available, documenting the acknowledgement.

The issue, otherwise, is to have them appear on systems where they are unavailable, but fail to install. Indeed, opam list includes them -- although it could do some checks and hide them. The Debian policy, thus, prevents spurious packages from being browsable. There may be two reasons why we would want to still show them, though:

All grief here is actually directed towards opam list, it seems. So that can be fairly easy to change.

Unless I missed @rdicosmo's point ?

avsm commented 9 years ago

On 18 May 2015, at 03:43, Louis Gesbert notifications@github.com wrote:

All grief here is actually directed towards opam list, it seems. So that can be fairly easy to change

I think that's a good summary. Removing (or marking) uninstallable packages from opam list makes sense.

lefessan commented 9 years ago

Another solution would be to make opam list print all packages, including the ones that are not available in the current switch (for any reason, i.e. specified as not available, or missing a dependency), and then print an "installability" status on the same line: something like:

ocamlfind installable oasis installable arm64lib unavailable-specified mylib unavailable-computed and so on...

abate commented 9 years ago

I like fabrice's solution. Using dose, this should be a breeze. All necessary bricks are already there and it's reasonable fast to do it (on the opam repo, matter of a few seconds). However, even if it take only 2 seconds, it might be frustrating for the average user.