commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
3.98k stars 842 forks source link

Supporting using the same .stack-work directory for multiple projects #5442

Open KevOrr opened 3 years ago

KevOrr commented 3 years ago

Tracking the conversation in https://github.com/commercialhaskell/stack/issues/1178, it seems that it's not supported to have one shared .stack-work directory for multiple projects, although it would be nice in order to cache work in a CI system. I understand that the path to the workdir must be a relative path to a child directory, but I was curious if the actual directories could be the same, and the per-project workdirs could simply be symlinks to the one shared workdir. However, I run into an interesting error (sanitized):

[41 of 44] Compiling MainProj.A
/home/kevin/main-proj/src/MainProj/A.hs:190:38: error:
    • Couldn't match type ‘somelib-0.1.0.0:SomeLib.LibMod.SomeType’
                     with ‘SomeType’
      NB: ‘SomeType‘ is defined at
            src/SomeLib/LibMod.chs:(24,1)-(33,29)
          ‘somelib-0.1.0.0:SomeLib.LibMod.SomeType’
            is defined in ‘SomeLib.LibMod’ in package ‘somelib-0.1.0.0’
        arising from a functional dependency between:
          constraint ‘BB.HasSomeType (Foo f) SomeType’
            arising from a use of ‘BB.someType’
          instance ‘BB.HasSomeType
                      (Foo f) somelib-0.1.0.0:SomeLib.LibMod.SomeType’
            at <no location info>
    • In the second argument of ‘(^.)’, namely ‘BB.someType’
      In the expression: bar ^. BB.someType
      In the expression:
        case bar ^. BB.someType of
          SomeCons1 -> f True
          SomeCons2 -> f False
          _ -> return Nothing
    |              
190 | getThing bar = case bar ^. BB.someType of
    |                            ^^^^^^^^^^^

(here BB is an alias for another module in SomeLib).

This is the sort of error that I was expecting to get, if there would be any errors. However, this is surprisingly the first (and only) error I get in the first 40 modules for this project, and many of the earlier modules do in fact use the library. However, it's the only one in the first 40 that seems to use a typeclass that's exported by the library, and I wonder if there are any workarounds I could apply to either code base in order to support using the same workdir for both projects.

Stack version

$ stack --version
Version 2.5.1, Git revision d6ab861544918185236cf826cb2028abb266d6d5 x86_64 hpack-0.33.0

Method of installation

stack-static AUR package

marcellodesales commented 2 years ago

Would a docker image resolve this?

KevOrr commented 2 years ago

@marcellodesales thank you for your reply. I'm not sure I understand though. Do you mean running the same experiment in a docker container?

marcellodesales commented 2 years ago

@KevOrr since a docker container isolates an execution, then you could say that multiple executions on multiple projects would be isolated... so, yes, in that sense, running a container would isolate the debug process... I had a similar problem before...

KevOrr commented 2 years ago

As I understand it, individual .stack-work directories already isolate individual projects; instead, I'm hoping to share identical artifacts produced by different projects so that duplicate artifacts don't need to be stored twice on disk. However, I think this is only a problem with projects that have local dependencies; otherwise, any shared artifacts goes to the user-local stack directory ~/.stack

marcellodesales commented 2 years ago

@KevOrr If you are talking about shared libraries, then this smells like a shared dependency... As documented by the 12-factor apps https://12factor.net/dependencies, shared libraries MUST be declared, isolated, versioned, etc, like Java, Python, NodeJs, etc. I use stack/cabal on the same level because shared libraries are usually part of the ecosystem...

  1. Create a shared library
  2. Publish on the registry
  3. Declare on project that depend on it
  4. Isolate the development and verify its use

This is what I was suggesting when isolating the development with Docker. As your app declares a dependency to your shared library, it's transparent to the development process... A specific step to download the Stack dependencies will pull anything you have declared, including the shared library in question, and that would end up on the expected directory :)

I have dockerized Stack+Cabal and I've been using this same isolation process, since this pattern is used by any other programming language... If it goes to ~/.stack or to its local .stack-work dir is just a design decision of your Dockerfile

🍺 cheers

Update

NOTE: Docker images are isolated and they are similar to a VM. It would be impossible for a brand-new Docker image to contain repeated dependencies or dependencies from another process, since your app declares its needed dependencies... Therefore, isolating any app using stack as a dependency management has the same isolation effect when using Docker Images...

KevOrr commented 2 years ago

Publishing packages is not always on option, hence stack's support for local projects in extra-deps. Isolating each project into its own docker image doesn't help things. Imagine this layout:

.
├── project_a
│   ├── .stack-work
│   ├── stack.yaml
│   └── ...
├── project_b
│   ├── .stack-work
│   ├── stack.yaml
    └── ...

If project_a is built in a docker image, that image will include all of the build artifacts of project_a in project_a/.stack-work (as well as artifacts from dependencies, stored in ~/.stack, but this can be a shared layer between the two docker images as you point out). Building project_b in a separate image, however, will include both artifacts for project_a and project_b in project_b/.stack-work.

This obviously creates large swaths of duplicated build artifacts across the two images. If, instead, project_a and project_b could share the same .stack-work directory, then there is the possibility* of de-duplication, whether in docker images or not.

In one of my projects, overlapping the dist directories would save 250MiB, and in another, it would save 1.7GiB, but unfortunately it seems like the install directory doesn't share much since final artifacts (static libraries, dynamic libraries, etc) don't seem to be identical (`.stack-work/install///lib//{.so,/*.a,etc}`)

marcellodesales commented 2 years ago

First problem: how to manage shared libraries... doesn't stack already control it? as I pointed out, the design pattern for that is the 12-factor apps section on dependencies isolation... If they share one or more, each of them should resolve them separately and independently...

I always consider project A and project B 2 separate Git Repos, each of them with their own CI/CD pipeline, each of them building independently... Docker images are built only at CI time... It's the developer's responsibility to run and set them up in his/her own IDE... But ultimately, at source-code needs to be built, packaged, versioned, to be deployed and that's what Docker has been helping for years now...

Following your example, I would create "Data Containers", whose responsibility could be to just store these larger dependencies... you can either use an orchestrator (Kubernetes, Docker Swarm, shell) to share the volumes during the execution, or define a structure of dependency of Project A -> Dependencies and Project B -> Dependencies as isolated docker images... Finally, the final install command is related to the project and, therefore.

Edit

Just my 2c.

KevOrr commented 2 years ago

I am really not sure how throwing this in a docker image addresses this issue at all.

This issue is very narrow in scope. When a stack project B declares as an extra-dep another local stack project A, then build artifacts for A will end up in both A’s work directory (when A is built), as well as B’s work directory (when B is built).

This does not have anything to do with hackage dependencies that A and B share.

I am already using Docker, for many of the reasons that you have stated. But this issue is completely independent of that. CI servers are not the only environment that stack runs on; this issue applies just the same to dev environments as it does to automated builds.