commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
4k stars 843 forks source link

Begin using component-based builds #4745

Closed snoyberg closed 11 months ago

snoyberg commented 5 years ago

Originally discussed in #834, we shelved work on this until Cabal added support. Cabal added support for component based builds long enough ago that we can drop support for older versions and fully embrace a component based system (see #4475). Not only will this help address some bugs, but will also open the door for Backpack support (see #2540).

@qrilka is planning on taking a look at this to estimate how long an implementation will take. This is currently planned for Stack 2.0, but may be deferred if it turns out this is more complex than it seems.

Additional issues which may be affected by this (incomplete list):

snoyberg commented 5 years ago

I've given this a bit more thought, and given the number of changes already planned for Stack 2.0, and the goal of releasing relatively soon, I'd rather hold off on this until Stack 2.2. I'm going to leave this in the P1 milestone for the Stack 2.2 release.

Ericson2314 commented 5 years ago

What's the progress on this? 2.2 is the next non-patch release right? Correct me if I'm wrong, but the linked issues all are reverse dependencies on this. Are there other issues/PRs that are working towards this?

snoyberg commented 5 years ago

I believe there is currently no progress on this, unless someone is working on it that I'm not aware of. This would definitely be a great addition to get in, and I'd be happy to provide feedback/mentoring if someone is interested in tackling it.

Ericson2314 commented 4 years ago

@snoyberg Thanks for the update! I came here from (surprise) the backpack issue (#2540), as this depends on this. Obsidian is potentially interested in working on this. I'll need to make estimates and roadmap.

theobat commented 4 years ago

So as @qrilka stated he won't be able to work on this, and this is needed for backpack support (and likely needed for additional cabal3 features such as public sublibraries). I started to look at the code a bit, I feel like this change is very pervasive in the code base such as changing all mentions of Package in Build to Component ? Is there type creation involved or is every notion of cabal component already in there (with a few additions to NamedComponent ?)

Also I'm not sure to what extent this is impacting the build plan ? Or is it only impacting the Task level ?

Maybe a more precise description of what you mean by "component-based builds" would help @snoyberg ?

tl;dr I'm interested in doing this, but I wouldn't mind a little help to assemble a "todo" list.

theobat commented 4 years ago

So, as far as I understand, the main structural change is to switch

-- extracted from Stack.Build.ConstructPlan
Task { taskProvides = PackageIdentifier
                    (packageName package)
                    (packageVersion package)
-- ... other stuff

Into something with an additional NamedComponent instead (and pass it from the cabal file representation, which, I havn't figured out yet).

This is however, (still, as far as I understand), not enough for a real component-based build; we should build all the transitive dependencies of a component with an appropriate NamedComponent specifier (e.g build lib or internal-lib xyz only if it's something with more than one option). SO quite a bit of change in the addPackageDeps function of Stack.Build.ConstructPlan.

snoyberg commented 4 years ago

@theobat thank you for volunteering to step up on this one, we'd definitely appreciate the help! I think you generally speaking have the right idea. But I'm not the person with the most up-to-date information on this. @qrilka has more details, but is currently on vacation. When he's back, he should be able to provide more concrete feedback.

qrilka commented 4 years ago

Thanks @theobat for your interest in this ticket, finally I'm back from my vacation and hopefully could help a bit here. When I looked into this topic I had an initial plan like the following: find out how to build components using Setup.hs and then create a proper build plans with components which should look like a graph. The current plan generation in Stack.Build.ConstructPlan is a bit ad-hoc and that results for example in problems with handling dependency cycles. With that in place I think fixing execution of build plans shouldn't be a problem. I know it's not a very detailed plan but I suppose that further details and directions will be known after starting the tasks I listed. I'd be happy to answer any further questions if I can.

theobat commented 4 years ago

hi @qrilka, thanks for the answer. I'm getting into Setup.hs in the cabal doc/code. If you have any pointer to a good ressource for learning about it, please let me know.

qrilka commented 4 years ago

Unfortunately I'm not sure if there is a good resource describing the interface Cabal provides with Setup.hs but at least the main details could be seen in its CLI help, e.g. by using stack runhaskell -- Setup.hs --help (or just use runhaskell if there is a proper GHC on your $PATH)

theobat commented 4 years ago

So as far as I understand, stack already use Setup.hs to build packages. The real question is how to use setup.hs for components (pretty obvious with hindsight, but I was not aware of the former...).

Here's what I gathered:

In Cabal 2.0, support for a single positional argument was added to runhaskell Setup.hs configure This makes Cabal configure the specific component to be configured. Specified names can be qualified with lib: or exe: in case just a name is ambiguous (as would be the case for a package named p which has a library and an executable named p.) This has the following effects:

  • Subsequent invocations of cabal build, register, etc. operate only on the configured component.
  • Cabal requires all “internal” dependencies (e.g., an executable depending on a library defined in the same package) must be found in the set of databases via --package-db (and related flags): these dependencies are assumed to be up-to-date. A dependency can be explicitly specified using --dependency simply by giving the name of the internal library; e.g., the dependency for an internal library named foo is given as --dependency=pkg-internal=pkg-1.0-internal-abcd.
  • Only the dependencies needed for the requested component are required. Similarly, when --exact-configuration is specified, it’s only necessary to specify --dependency for the component. (As mentioned previously, you must specify internal dependencies as well.)
  • Internal build-tool-depends and build-tools dependencies are expected to be in the PATH upon subsequent invocations of setup.

I should probably propose an example for this (examples are great when you're the reader...) in cabal. Here's how I understand it, building components is not different from building packages, you just have to use the correct positional argument sepcifying what to configure first and then build as it's built today.

So my idea of the "shortest path" to have component based builds:

Now this is a simple and ad-hoc addition, and I don't see why Stack.Build.ConstructPlan should be involved, but I'm certainly missing something. Besides, my goal is to set up proper foundations for supporting backpack, so I suppose I should check how backpack works at the runhaskell cli level...

qrilka commented 4 years ago

If I understand correctly you're planning just to make components explicit when they were set implicitly as "build complete package"/"build complete package with tests/benchmarks". I think this could help with some issues where components appear only as targets as it will not be able to use components as dependencies (which requires ConstructPlan changes) and thus e.g. cycle from https://github.com/commercialhaskell/stack/issues/4891#issuecomment-504084974 won't be treated properly. As I'm not very familiar how Backpack works under the hood it's not clear if your "shortest path" proposal could be enough for Backpack.

theobat commented 4 years ago

Fine, let's talk about the grand plan then. It seems that Plan is PackageName's adressed:

-- | A complete plan of what needs to be built and how to do it
data Plan = Plan
    { planTasks :: !(Map PackageName Task)
    , planFinals :: !(Map PackageName Task)
    -- ^ Final actions to be taken (test, benchmark, etc)
    , planUnregisterLocal :: !(Map GhcPkgId (PackageIdentifier, Text))
    -- ^ Text is reason we're unregistering, for display only
    , planInstallExes :: !(Map Text InstallLocation)
    -- ^ Executables that should be installed after successful building
    }
    deriving Show

At the very least, planTasks should have the full component name + package name. Now it's the first problem there is no such type. Something like:

data PackageAndComponentName = Plan
    { packageName :: !PackageName
    , componentName :: !NamedComponent
    }
    deriving (Show, Eq)
-- with NamedComponent as:
-- | A single, fully resolved component of a package
data NamedComponent
    = CLib
    | CInternalLib !Text
    | CExe !Text
    | CTest !Text
    | CBench !Text
    deriving (Show, Eq, Ord)

And then we could use that to address the planTasks, which to my understanding would probably solve the circular dependency problem. Then a bunch of Plan consumers would have to change. The other part of this problem, as I see it, is to update Task datatype which has mostly two attributes of interest:

-- | A task to perform when building
data Task = Task
    { taskProvides        :: !PackageIdentifier -- FIXME turn this into a function on taskType?
    -- ^ the package/version to be built
    , taskType            :: !TaskType
    -- ^ the task type, telling us how to build this
-- ... etc ...
}

-- task type:

-- | The type of a task, either building local code or something from the
-- package index (upstream)
data TaskType
  = TTLocalMutable LocalPackage
  | TTRemotePackage IsMutable Package PackageLocationImmutable
    deriving Show

Nothinf currently points in the Packagetype what component we should build (it's a descriptive type) and nothing in the Task type either. So, something like componentToBuild :: !NamedComponent should probably be added to task.

From this point, we mostly have to change addDep in ConstructPlan to use the freshly created PackageAndComponentName.

There are a bunch of problems though since the Installed packages only suppport :

data Installed
    = Library PackageIdentifier GhcPkgId (Maybe (Either SPDX.License License))
    | Executable PackageIdentifier
    deriving (Show, Eq)

Which is not what we want (we want the full power of installed component that is the constructor list provided by NamedComponent). Overall everything being PackageName addressed is a deep problem, and a real PackageAndComponentName should be used, pretty much everywhere...

Seems like a pretty drastic change everywhere on a hard to navigate codebase ( for various reasons ; Execute.hs is huge and not modular, TemplateHaskell and CPP usage prevent the excellent haskell language server from functionning properly). Anyway I'll tinker a bit to see what can be done, please let me know if I'm mistakening somewhere or missing something.

qrilka commented 4 years ago

I don't see any particular comments to what you have written with an exception of 1 question: what are the "modules" you have highlighted as bold? Did you mean components maybe?

theobat commented 4 years ago

Yes It's a mistake I fixed it, thanks.

qrilka commented 4 years ago

Sure, no problem

qrilka commented 4 years ago

Hello @theobat I wanted to ask about your progress with this issue and if I could be of any help. Also during previous weeks I was trying to get a correct solution for #5381 but it looks like that such a proper solution could require component-based builds. And at the same time it looks like the current implementation of "implicit snapshots" which was the main reason for Stack 2 has some issues with component based builds. The main point is that Stack assumes 1 snapshot as a source of all dependencies of the project. And for a normal project stack build and stack test dependencies are different and thus give different snapshots. Some way around this is that Stack assumes that a library built with a wider "target" as e.g. stack test works a more narrow stack build and because of that stack build after stack test will not rebuild anything. A proper way around this would be to have dependency tracking per component. I'm not sure if anything of this is needed for Backpack but building new hacks around current hacks in Stack could get result in a fragile construction. Hopefully this gives you some more information and not just scares you away :)

theobat commented 4 years ago

Hi @qrilka,

I've been busy the past ffew weeks, but I did get a better understanding of the whole stack build function calls. What seems to matter is the 3 last bullet points in the following list:

Now my main concern is that, what I planned to do above only represent half of the equation (constructing the plan and executing it). I'm stuck on the source map changes needed for now. I don't really understand how to gather the components dependencies from stack.yaml itself (or the cabal file for that matter). It definitely seems absent in the current SourceMap, and the issue is that the dependencies are distributed by locations such as:

So maybe you can help me with that ? Otherwise it'll need a bit more digging in the whole SourceMap thing.

PS: as a side note, why is template haskell so pervasive in the code base ? What's his purpose ? It prevents the haskell-language-server to function properly, so maybe we should do something about this, it definitely hampers any effort on the codebase. At the very least it should be isolated enough so that commenting out one import and providing a mock settles the problem.

qrilka commented 4 years ago

Actually I have given a part of a response to your question in my reply above - SourceMap as a base of implicit snapshots it not quite compatible with components thus needs to be changed. And I guess dependency tracking also needs to be updated. The current dependency information is mostly Cabal 1.0 - i.e. tracking only dependencies between packages (and package ~ library) with some extra hacks on top (e.g. supporting sublibraries to some extent). In principle implementation could be done in 2 steps:

theobat commented 3 years ago

A stratospherically high overview

This is the set of actions that are required for having full blown component based builds. First of all, at a high level, what's "just" missing is:

Now the reality is much worse, because there are a bunch of datatypes which are still heavily based on old cabal versions such as Package in Package.hs.

As stated in the snoyberg comment of the file, the whole mess between Package, LocalPackage etc should be simplified, which is probably the best first step to envision.

The grand master plan

For now the grand master plan, has 2 groups : the descriptive changes (1 and 2) and the actionnable changes (3 and 4). Descriptive changes should not change stack's behavior. actionnable ones should change stack behavior. It should roughly follow this:


The plan has to be refined a bit more to be sure this (significant) effort is the right one. A pull request for documenting current behaviors will stay open while I make the descriptive changes (it can be merged after that).

Request for comments @qrilka :)

qrilka commented 3 years ago

@theobat I didn't get what you mean in your item ii when you say that "InstalledMap misses the installed components" and also "this is not affecting the InstallMap type". If you say that we'll only need actually installed components - unfortunately I didn't check what does Cabal (or probably ghc-pkg) allow to find out (or regsiter): at least previously the situation was that only libraries could be registered and properly checked for being already registered. As for iii - I don't think I understand your plan about circular dependencies. One way out is of course is just to ignore them at the beginning. Otherwise I don't see any particular objections to you plan. Thanks for sharing.

theobat commented 3 years ago

I didn't get what you mean in your item ii when you say that "InstalledMap misses the installed components" and also "this is not affecting the InstallMap type". If you say that we'll only need actually installed components - unfortunately I didn't check what does Cabal (or probably ghc-pkg) allow to find out (or regsiter): at least previously the situation was that only libraries could be registered and properly checked for being already registered.

Maybe I don't understand what InstalledMap is all about, my understanding is that, if something has been installed, we track this in some kind of SQLite database which is updated whenever we build something new. I thought the question of recording the set of installed components was completely "controlled" and didn't think it could be a ghc/cabal thing. I had something like this in mind:

type InstalledMap = Map PackageName (Map NamedComponent (InstallLocation, Installed))

The toInstallMap and getInstalled (just below, same file) functions could use better comments though so feel free to give a better description so that I can improve their haddocks.

As for iii - I don't think I understand your plan about circular dependencies. One way out is of course is just to ignore them at the beginning.

With the Package datatype changes, dependencies will be tracked at the component level (old & default behavior, being the main lib component), which means the check for circular dependencies should support that, whereas today it's enforced using this:

type ConstructPlanState = Map PackageName (Either ConstructPlanException AddDepRes) 

As above I think something like

type ConstructPlanState = Map PackageName (Map NamedComponent (Either ConstructPlanException AddDepRes))

Will be required. But this means we should have a single task per component, while a bunch of things are common between components so I thought of something like:

type ConstructPlanState = Map PackageName (Map NamedComponent (Maybe ConstructPlanException), Maybe AddDepRes)

Which would enable a set of components to trigger a single task. It would probably trigger less changes to the task datatype. This is still at the idea stage though, maybe all this belongs in the task itself (that is, the Task datatype).

Maybe each step of the plan should be tracked in a dedicated issue, don't you think ?

qrilka commented 3 years ago

Ouch, it looks like I was not seeing the difference between InstallMap and InstalledMap. One note - Stack doesn't track installed packages by itself, it relies on package database for that. As for splitting this into separate tickets - I don't see such a need but if this would help you - go ahead. Just don't forget to interlink all the subparts properly.

theobat commented 3 years ago

One note - Stack doesn't track installed packages by itself, it relies on package database for that.

Could you elaborate on that ?

As for splitting this into separate tickets - I don't see such a need but if this would help you - go ahead. Just don't forget to interlink all the subparts properly.

On second thought I'm fine with keeping everything in this one, we can separate the PR's themselves, it should be enough.

qrilka commented 3 years ago

Could you elaborate on that ?

See loadDatabase - it loads packages from a ghc-pkg dump

theobat commented 3 years ago

So what I'm talking about is likely this: https://downloads.haskell.org/ghc/latest/docs/html/libraries/Cabal-3.2.0.0/Distribution-InstalledPackageInfo.html#v:installedComponentId Which roughly means changing all the Map PackageName DumpPackage and their subsequent use to something like Map PackageName (Map NamedComponent DumpPackage)

EDIT: This is quite undocumented in ghc-pkg but there is a ----ipid option to get the lib-name key:

➜  okok ghc-pkg --ipid describe kk-0.1.0.0-e5f5040f -f ~/.cabal/store/ghc-8.8.4/package.db
name:               z-okok-z-band
version:            0.1.0.0
package-name:       okok
lib-name:           band
id:                 kk-0.1.0.0-e5f5040f
key:                kk-0.1.0.0-e5f5040f
maintainer:         
author:             
abi:                0c91a172260cad5da7d5ed91dbf6e936
data-dir:
    ~/.cabal/store/ghc-8.8.4/kk-0.1.0.0-e5f5040f/share

depends:            base-4.13.0.0

This is necessary for DumpPackage rework (and InstalledMap etc)

simonmichael commented 2 years ago

This sounds like a good thing, and I really appreciate the work done on it. Were showstoppers found, or is the plan still viable ? [Found this after trying to make hls+stack work on a package with an internal library.]

theobat commented 2 years ago

hey @simonmichael thanks for your interest, I lack the time to implement the plan drawn here, but most of it is still valid as far as I know. I think my initial PRs focused too much on refactoring existing things in Stack, which consumed a lot of time and hindered the underlying goal. This PR https://github.com/commercialhaskell/stack/pull/5504 already implement most of the plan though, so maybe a good first step is to understand why the tests are failing in it (Which I failed to understand, but I'll admit I didn't look much because I ran out of time).

mpilgrem commented 11 months ago

@theobat, given your #6343, I wondered if this issue could be sensibly closed in favour, perhaps, of opening more specific/granular ones. I would welcome your view.

theobat commented 11 months ago

Yes this should be closed, I'll rephrase what's still valid in another dedicated issue.