commercialhaskell / stackage

Stable Haskell package sets: vetted consistent packages from Hackage
https://www.stackage.org/
MIT License
527 stars 805 forks source link

Safety issues in Stackage? #6590

Closed pleger closed 9 months ago

pleger commented 2 years ago

Dear members, We have analyzed the evolution of the Stackage repository (www.stackage.org/), published at the 37th ACM/SIGAPP Symposium On Applied Computing (SAC 2022) [1]. In this paper, we have discovered some potential issues in the repository, which I would like to check with you whether we correct or not. In concrete, in the dependencies analysis, we made the following two findings:

Extract from the paper: According to versions of dependencies per package, we present two findings. First, Figure 6a [left chart below] shows the number and percentage of packages whose dependency versions are not available in its specific Stackage release, meaning that are non-stable dependencies for that release. Although these packages are few, there is a growing trend that might affect the stability promise of Stackage. Second, Figure 6b [right chart below] shows that some packages depend on the same package but with different ranges of versions; making incompatible dependencies of packages.

As we are not pretty sure about the correctness of the previous two findings, we would like to you if can give us your opinion.

Screen Shot 2022-05-23 at 13 16 50

[1] The direct link to the SAC 2022 paper:
https://pleger.cl/cv-pleger/papers/legerAl-SAC2022.pdf This paper also includes other analyses at Stackage.

DanBurton commented 2 years ago

I'm not sure I follow. I'll have more time to take a look at the paper later to ascertain the details. But can you give me an example of a specific package in a specific snapshot, and which dependencies of that package are not included in said snapshot?

bergmark commented 2 years ago

Please correct me if I misunderstand the paper, but it looks like you are basing this data on the dependencies that are specified in the .cabal files on Hackage?

If so,

For part 2-inestable the chart shows the data for the Hackage releases if used without a stackage snapshot (e.g. with the cabal-install solver). If using a stackage snapshot there will be specific version of all dependencies, see e.g. the latest LTS snapshot. The notable exception is that GHC boot libraries are not part of the snapshot as we use the versions that ship with GHC, but these versions should also be stable as the snapshot specifies which GHC release should be used.

The main point of Stackage is to specify exactly one version of every package to avoid this instability. Versions only change if you change the snapshot you are using.

part 2-incompatible: I would appreciate some examples so I can look into them. Depending on how you analyze dependencies there could be a mismatch between dependencies for a package's library/executable(s), test-suite(s), and benchmarks. For the resulting Stackage snapshots we only care about library/executable dependencies as a consumer generally only wants to build these.

Another thing that could be taken into account is the revision of the package, which may affect required dependency versions. By default, stackage snapshots use the latest revision that is available when the snapshot is created; This is configurable in Stackage, but this configuration is seldom used. When analyzing a .cabal file with a snapshot you should pick the .cabal file belonging to the the revision used in the snapshot. Unfortunately I'm not clear on the specifics here, I do not know how to parse a snapshot file and retrieve these revisions. If you are using e.g. cabal get to download packages from Hackage then you will get the latest revision, whereas if you download the tarball directly you will get the unrevised version of the package.

Bodigrim commented 2 years ago

I don't follow, Stackage snapshots include ~3000 packages, so 0.02% on the plot equals to 0.6 packages which is clearly nonsense.

pleger commented 2 years ago

Hello @bergmark and @Bodigrim , thanks for your prompt replies.

@Bodigrim

I don't follow, Stackage snapshots include ~3000 packages, so 0.02% on the plot equals to 0.6 packages which is clearly nonsense.

As mentioned in the post, we are analyzing dependencies in the chart, no packages (thanks !!! ... we need to be more clear in this aspect). We count the total number of (direct) dependencies that there are in an LTS; we then count how many versions of dependencies are not available in an LTS. For example, if we took LTS 15-3, we have:

Total dependencies with OUT_RANGE:  289 (unstable dependencies) --> 0.02%
Total dependencies with IN_RANGE:  10024
Total dependencies with ANY:  4357
Total dependencies:  14670

Regarding @bergmark, it is a bit more complex to explain. So, I will describe the steps:

  1. We select an LTS. E.g., 18-18 https://www.stackage.org/lts-18.18
  2. We select a package with its version in that LTS. E.g, https://www.stackage.org/lts-18.18/package/dlist-nonempty-0.1.1
  3. We download the package with that version from Hackage. E.g., https://hackage.haskell.org/package/dlist-nonempty
  4. Using the .cabal file from this version, we got all dependencies with their versions.
  5. We take a dependency with its associated version. E.g, base (base >=4.5 && <4.11,)
  6. We look at that LTS (18-18 in this example), whose version is available. E.g., base-4.14.3.0 (https://www.stackage.org/lts-18.18/package/base-4.14.3.0)
  7. We compare both versions: base >=4.5 && <4.11, && base-4.14.3.0, which are not compatible.

Apart from sorry for many (detailed) steps, we realized some details that were not so clear to us. If we take again dlist-nonempty-0.1.1 and we see base as a dependency, we get:

  1. Hackage Website: base >=4.5 && <4.17,
  2. GitHub: (.cabal file) base >=4.5 && <4.17,
  3. Hackage (.cabal in .tar.gz file on https://hackage.haskell.org/package/dlist-nonempty-0.1.1/dlist-nonempty-0.1.1.tar.gz): base >=4.5 && <4.11,

So, for the same package with its version, we can get different versions on its dependencies. When you mention the revisions concept (thanks for this tip), we might suppose that we get the last revision from the link available on the Hackage Website (e.g., https://hackage.haskell.org/package/dlist-nonempty-0.1.1/dlist-nonempty-0.1.1.tar.gz), but maybe not ... according to the Hackage website again and the GitHub link.

Thanks a lot for your potential answer and opinions, Paul

DanBurton commented 2 years ago

Revisions likely account for all of the discrepancies.

LTS-18.18 was published on 2021-11-19.

dlist-nonempty-0.1.1 revision 12 was published on 2021-11-16, so we likely used that revision when running the stackage build.

You can get the .cabal file as of a given revision like so: https://hackage.haskell.org/package/dlist-nonempty-0.1.1/revision/12.cabal

This url gives the latest revision of the cabal file: https://hackage.haskell.org/package/dlist-nonempty-0.1.1/dlist-nonempty.cabal

andreasabel commented 2 years ago

If I understand the process of entering a package X into a Stackage snapshot (nightly or LTS) correctly, then all of X's dependencies according to X's latest version/revision on Hackage need to be in the snapshot already with a version as specified in the version range demanded by X.

Thus "inestable" packages should be impossible. Do you agree, @pleger ?

This might invalidate some of your SAC 2022 results (quoting from the abstract) and maybe call for an erratum:

Our findings show, for example, a growing trend of packages is depending on other packages whose versions are not available in a particular release of Stackage ; opening a potential stability issue.

ifigueroap commented 2 years ago

Hi, I'm also a co-author of the paper..

If I understand the process of entering a package X into a Stackage snapshot (nightly or LTS) correctly, then all of X's dependencies according to X's latest version/revision on Hackage need to be in the snapshot already with a version as specified in the version range demanded by X.

Thus "inestable" packages should be impossible. Do you agree, @pleger ?

That is precisely the point of this research, to figure out whether---quoting ghc's error message---"the impossible happened!". I think we'll have to look again into the revision files, and check whether the analysis holds out. Does the stack install command use the revised cabal files rather than the "original"?

I'd also like to comment that this is a follow-up paper, coming after our study "Which Monads Haskell Developers Use: An Exploratory Study" (link to pdf), where we found many inconsistencies in the package info... hence we wanted to know how Stackage improves on this.

Thank you all for your replies and follow-through questions, as it is very helpful for us to improve our research. To the best of our knowledge, these papers are among the first to apply Mining Software Repositories techniques to the Haskell ecosystem, hence we expect to have some rough edges :)

andreasabel commented 2 years ago

Does the stack install command use the revised cabal files rather than the "original"?

When you use cabal get X-n.m.k then cabal downloads package X-n.m.k and applies the latest revisions to it. stack supports a syntax like X-n.m.k@rev:l to also pin down a specific revision of a package. I think if such a revision is not specified, you get the latest revision.

See e.g. https://www.stackage.org/package/hakyll for revision pins:

hakyll

A static website compiler library

http://jaspervdj.be/hakyll

LTS Haskell 19.8: | 4.15.1.1@rev:3 -- | -- Stackage Nightly 2022-03-17: | 4.15.1.1@rev:1 Latest on Hackage: | 4.15.1.1@rev:3
hakyll A static website compiler library http://jaspervdj.be/hakyll [LTS Haskell 19.8](https://www.stackage.org/lts-19.8): [4.15.1.1@rev:3](https://www.stackage.org/lts-19.8/package/hakyll) [Stackage Nightly 2022-03-17](https://www.stackage.org/nightly-2022-03-17): [4.15.1.1@rev:1](https://www.stackage.org/nightly-2022-03-17/package/hakyll) Latest on Hackage: [4.15.1.1@rev:3](https://hackage.haskell.org/package/hakyll)
juhp commented 1 year ago

Thank you for sharing your research with us.

If no actions are required, we could probably move to close this issue now.

mihaimaruseac commented 9 months ago

Closing as it seems resolved with no action required. Please reopen if that's not the case