commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
3.99k stars 844 forks source link

Don't record VersionRange for dependencies #3723

Open ezyang opened 6 years ago

ezyang commented 6 years ago

I was always under the impression that in Stack, we take the idea, "One package, one version!" very seriously. So imagine my surprise when I was reading the Stack source code and I noticed the following strangeness in its definition of Package:

data Package = 
         ...
          ,packageDeps :: !(Map PackageName VersionRange) -- ^ Packages that the package depends on.
          ,packageTools :: !(Map ExeName VersionRange)    -- ^ A build tool name.

Why does Stack need to know the VersionRange here? This seems inconsistent with the "One package, one version" philosophy. Boldly, I decided to delete this information and see what happened.

Here were the major places where this information was being used in a nontrivial way:

To summarize, Stack never actually uses the version range to pick between multiple possible choices (except the Setup case, but I think that's impossible); instead, it uses it to do various consistency checks and slap the user on the wrist if they try to use a package whose dependency bounds don't match.

So, let me suggest something radical: why don't we just drop all of these checks. The version bound purists can add some sort of lint which checks if your install plan has inconsistent version ranges and lets the user know if the version ranges are not right... but it seems to me that it would be much simpler if Stack internally trusted in the snapshot (and the user extensions). I feel this would also reduce a lot of the friction with version bound updates and Hackage revisions, since a bound update wouldn't actually stop Stack from working (Stack already has special case code to trust snapshot versions, and I think that's exactly what you should do; go further!)

CC @snoyberg

snoyberg commented 6 years ago

I'm glad you've raised this, thank you. I don't disagree with your points at all, but there is another side I want to represent, which is the reason for the current behavior:

IMO, the clear right approach is a distinction between hard and soft bounds, and ideally putting the soft bounds into explicit metadata to be used only by dependency solving, but that's unlikely to happen any time soon. Another fallback is to treat the new caret operator (^>=) as soft bounds and ignore it in Stack, though that's already been discussed and rejected (#3464).

Whatever decision happens here would likely make sense in Stackage as well: if Stack decides to ignore version bounds, Stackage should feel free to do so.

I've overall been close enough to ambivalent on this that I haven't raised the issue, but it's a discussion worth having openly. Again, thank you for raising it.

ezyang commented 6 years ago

Well, there's an alternate way to go about doing this, which is to (1) externally have equivalent behavior, but (2) have Stack implement this internally as an extra, orthogonal check, rather than as part of the "core" build metadata. So for example, checkPackageBuildPlan should still implement the version bound checking functionality in some way (but not through packageDeps), but it seems questionable for depsPresent to decide that a dependency is not present simply because we found the wrong version (this simply does not sound right to me.)

BTW, the reason this came up for me is that, for per-component builds, I am going to have to rewrite all of this dependency metadata to refer to specific components rather than full packages, and it is a far simpler conceptual model if there aren't any version ranges present here (it's very confusing to speak of a version bound on a component dependency; there is no such thing.)

snoyberg commented 6 years ago

I probably misread this completely then. If this is a technical discussion of how to modify the internals of the code without affecting external behavior: I have no objection at all. If it's about changing the external behavior: I'm very much open to the discussion.

mgsloan commented 6 years ago

EDIT: was responding to original post without seeing subsequent conversation. Seems that my points did not overlap heavily with what was already discussed anyway:

So, let me suggest something radical: why don't we just drop all of these checks.

Hmm, interesting idea! I think I'd prefer to keep the bounds info in the datatypes though, as the plan construction errors can be quite informative.

Currently if you do allow-newer: true it will ignore all bounds (I realize that's a strange name for something that ignores lower bounds). I can imagine it being rather nice to have a new default option allow-newer: warn which warns about constraint errors but continues to build anyway.

withSingleContext uses the version range to select version dependencies from the mdeps argument. I am not sure when multiple instances of the package could ever show up here (violation of the one version principle), so it seems like defensive coding that was added when custom setup support was added.

Yeah that seems like a potential bug, good catch. The bug would only be problematic when allow-newer: true, as in that case custom setup constraints should also not be enforced. Not sure if duplicate packages can end up in that map, usually there shouldn't be. It might be possible to have multiple Cabal library versions - https://github.com/commercialhaskell/stack/issues/3049

ezyang commented 6 years ago

If this is a technical discussion of how to modify the internals of the code without affecting external behavior: I have no objection at all. If it's about changing the external behavior: I'm very much open to the discussion.

I'm happy to keep this as an internals only change discussion, unless the internal change makes it difficult to implement the old external behavior :)

Hmm, interesting idea! I think I'd prefer to keep the bounds info in the datatypes though, as the plan construction errors can be quite informative.

But here's my question: why does bound checking happen during plan construction at all? :) The version bounds consistency of local + snapshot is independent of what actually is requested to be built during a plan. One logical answer is that you don't want to eagerly load all of the version bounds for everything to check for consistency, it's more useful to only check the things the user actually (transitively) requested. That seems like a legitimate implementation concern, but even in this regime, we shouldn't be using the version range to decide if a package is up-to-date or not. Am I missing something?

It might be possible to have multiple Cabal library versions - #3049

Let's see; is this true?

snoyberg commented 6 years ago

But here's my question: why does bound checking happen during plan construction at all?

That's a great way to frame the question. Most likely: hysterical raisins. IIRC, I wrote almost the entirety of that module, and probably wrote it before the rest of the code base was mature. I really like the idea of separating out the concerns here. I have a strong hunch that if we do this correctly, the (massively overcomplicated) ConstructPlan module may become much simpler. I'm not opposed to a full rewrite to be honest, which may be the best approach for your component-based work.

I agree with your assessment of the situation with Cabal. The only way in which we should care about the globally installed version of Cabal is if it introduces some behavioral changes for simple build types versus other potentially available Cabal versions. But further speculation on this probably belongs in #3049.

mgsloan commented 6 years ago

Yes, my mention of multiple cabal version in the map was purely speculative, I didn't look at the code. I suspect it is always just the snapshot / extra-dep version.

It would be nice to separate the version checking stuff out of ConstructPlan, even if it was just a new module that ConstructPlan invokes. That is indeed one of the most complicated parts of the whole codebase. It particularly accumulated complexity to handle cyclic build plans (package A's test-suite depending on package B which depends on package A). I imagine this can be greatly simplified with per component builds

So I am :+1: on cleaning it up and splitting it up. However, I like the behavior of doing version checks based on demand. Not only is it more efficient, it also allows you to build stuff in circumstances where you otherwise couldn't without using allow-newer. Maybe it's no big deal since allow-newer exists. Hmm, interesting!