Don't record VersionRange for dependencies

ezyang commented 6 years ago

I was always under the impression that in Stack, we take the idea, "One package, one version!" very seriously. So imagine my surprise when I was reading the Stack source code and I noticed the following strangeness in its definition of Package:

data Package = 
         ...
          ,packageDeps :: !(Map PackageName VersionRange) -- ^ Packages that the package depends on.
          ,packageTools :: !(Map ExeName VersionRange)    -- ^ A build tool name.

Why does Stack need to know the VersionRange here? This seems inconsistent with the "One package, one version" philosophy. Boldly, I decided to delete this information and see what happened.

Here were the major places where this information was being used in a nontrivial way:

depsPresent uses the version range to do a check against the InstalledMap (a map from package name to information--one version assumption!) to see if we should actually accept the installed package or not.
checkPackageBuildPlan uses the version range to see if version bounds of a package were actually satisfied by its dependencies.
addPackageDeps uses the version range to decide if a package selected by addDep is acceptable. But even if it's not acceptable, there are two cases where it can overrule the decision: if allow-newer was passed, or if the snapshot implicitly states that the version combination is acceptable
withSingleContext uses the version range to select version dependencies from the mdeps argument. I am not sure when multiple instances of the package could ever show up here (violation of the one version principle), so it seems like defensive coding that was added when custom setup support was added. Maybe @mgsloan remembers what's going on here.

To summarize, Stack never actually uses the version range to pick between multiple possible choices (except the Setup case, but I think that's impossible); instead, it uses it to do various consistency checks and slap the user on the wrist if they try to use a package whose dependency bounds don't match.

So, let me suggest something radical: why don't we just drop all of these checks. The version bound purists can add some sort of lint which checks if your install plan has inconsistent version ranges and lets the user know if the version ranges are not right... but it seems to me that it would be much simpler if Stack internally trusted in the snapshot (and the user extensions). I feel this would also reduce a lot of the friction with version bound updates and Hackage revisions, since a bound update wouldn't actually stop Stack from working (Stack already has special case code to trust snapshot versions, and I think that's exactly what you should do; go further!)

CC @snoyberg

snoyberg commented 6 years ago

I'm glad you've raised this, thank you. I don't disagree with your points at all, but there is another side I want to represent, which is the reason for the current behavior:

Stack should remain consistent with Cabal-the-build-system as much as possible. Cabal the build system definitely enforces hard bounds, and Stack simply ignoring them would be counterintuitive.
The user adding explicit extra-deps can easily run into a situation of combining versions that are known not to work, so preventing this by default (with allow-newer: true available as a workaround) is a good idea.
In principle, hard bounds can prevent against bugs that cannot be detected with build failures alone, namely semantic mismatches. (Counterpoint: in practice, the vast majority of bounds issues are of the build variety.)
Likely most important point: such a decision may be unfair to package authors. Suppose I write foo-1.2.3 with a build-depends: bar < 2. A user comes along, adds extra-deps: [foo-1.2.3, bar-2.0], and gets no pushback from Stack. The user builds, and gets an error about foo, and runs to file a report against foo. The author never claimed that this was OK, and may be rightfully annoyed at getting such bug reports. (Could be worked around though with post-build messages "oh, we noticed that foo doesn't really like that.)

IMO, the clear right approach is a distinction between hard and soft bounds, and ideally putting the soft bounds into explicit metadata to be used only by dependency solving, but that's unlikely to happen any time soon. Another fallback is to treat the new caret operator (^>=) as soft bounds and ignore it in Stack, though that's already been discussed and rejected (#3464).

Whatever decision happens here would likely make sense in Stackage as well: if Stack decides to ignore version bounds, Stackage should feel free to do so.

I've overall been close enough to ambivalent on this that I haven't raised the issue, but it's a discussion worth having openly. Again, thank you for raising it.

ezyang commented 6 years ago

Well, there's an alternate way to go about doing this, which is to (1) externally have equivalent behavior, but (2) have Stack implement this internally as an extra, orthogonal check, rather than as part of the "core" build metadata. So for example, checkPackageBuildPlan should still implement the version bound checking functionality in some way (but not through packageDeps), but it seems questionable for depsPresent to decide that a dependency is not present simply because we found the wrong version (this simply does not sound right to me.)

BTW, the reason this came up for me is that, for per-component builds, I am going to have to rewrite all of this dependency metadata to refer to specific components rather than full packages, and it is a far simpler conceptual model if there aren't any version ranges present here (it's very confusing to speak of a version bound on a component dependency; there is no such thing.)

snoyberg commented 6 years ago

I probably misread this completely then. If this is a technical discussion of how to modify the internals of the code without affecting external behavior: I have no objection at all. If it's about changing the external behavior: I'm very much open to the discussion.

mgsloan commented 6 years ago

EDIT: was responding to original post without seeing subsequent conversation. Seems that my points did not overlap heavily with what was already discussed anyway:

So, let me suggest something radical: why don't we just drop all of these checks.

Hmm, interesting idea! I think I'd prefer to keep the bounds info in the datatypes though, as the plan construction errors can be quite informative.

Currently if you do allow-newer: true it will ignore all bounds (I realize that's a strange name for something that ignores lower bounds). I can imagine it being rather nice to have a new default option allow-newer: warn which warns about constraint errors but continues to build anyway.

withSingleContext uses the version range to select version dependencies from the mdeps argument. I am not sure when multiple instances of the package could ever show up here (violation of the one version principle), so it seems like defensive coding that was added when custom setup support was added.

Yeah that seems like a potential bug, good catch. The bug would only be problematic when allow-newer: true, as in that case custom setup constraints should also not be enforced. Not sure if duplicate packages can end up in that map, usually there shouldn't be. It might be possible to have multiple Cabal library versions - https://github.com/commercialhaskell/stack/issues/3049

ezyang commented 6 years ago

If this is a technical discussion of how to modify the internals of the code without affecting external behavior: I have no objection at all. If it's about changing the external behavior: I'm very much open to the discussion.

I'm happy to keep this as an internals only change discussion, unless the internal change makes it difficult to implement the old external behavior :)

Hmm, interesting idea! I think I'd prefer to keep the bounds info in the datatypes though, as the plan construction errors can be quite informative.

But here's my question: why does bound checking happen during plan construction at all? :) The version bounds consistency of local + snapshot is independent of what actually is requested to be built during a plan. One logical answer is that you don't want to eagerly load all of the version bounds for everything to check for consistency, it's more useful to only check the things the user actually (transitively) requested. That seems like a legitimate implementation concern, but even in this regime, we shouldn't be using the version range to decide if a package is up-to-date or not. Am I missing something?

It might be possible to have multiple Cabal library versions - #3049

Let's see; is this true?

You have some snapshot version of Cabal. This is the only Cabal that is available for use as a dependency, in Custom setups or in regular packages. The situation here is normal.
You have some globally installed version of Cabal (stack setup -- upgrade-cabal). This is the version we are worried about. But even today, (prior to carrying out the plan in #3049), it doesn't seem like this should be a problem: in principle, this Cabal is effectively equivalent to a black box executable which Stack shells out to in order to make builds happen; this version of Cabal is never actually in the install plan, in the same way that Stack (the executable) is never part of the install plan. But I'll note that this perspective comes from not knowing how it is implemented.

snoyberg commented 6 years ago

But here's my question: why does bound checking happen during plan construction at all?

That's a great way to frame the question. Most likely: hysterical raisins. IIRC, I wrote almost the entirety of that module, and probably wrote it before the rest of the code base was mature. I really like the idea of separating out the concerns here. I have a strong hunch that if we do this correctly, the (massively overcomplicated) ConstructPlan module may become much simpler. I'm not opposed to a full rewrite to be honest, which may be the best approach for your component-based work.

I agree with your assessment of the situation with Cabal. The only way in which we should care about the globally installed version of Cabal is if it introduces some behavioral changes for simple build types versus other potentially available Cabal versions. But further speculation on this probably belongs in #3049.

mgsloan commented 6 years ago

Yes, my mention of multiple cabal version in the map was purely speculative, I didn't look at the code. I suspect it is always just the snapshot / extra-dep version.

It would be nice to separate the version checking stuff out of ConstructPlan, even if it was just a new module that ConstructPlan invokes. That is indeed one of the most complicated parts of the whole codebase. It particularly accumulated complexity to handle cyclic build plans (package A's test-suite depending on package B which depends on package A). I imagine this can be greatly simplified with per component builds

So I am :+1: on cleaning it up and splitting it up. However, I like the behavior of doing version checks based on demand. Not only is it more efficient, it also allows you to build stuff in circumstances where you otherwise couldn't without using allow-newer. Maybe it's no big deal since allow-newer exists. Hmm, interesting!

commercialhaskell / stack

Don't record VersionRange for dependencies #3723