[RFC] Statically linked binaries

actions / setup-haskell

Set up your GitHub Actions workflow with a specific version of Haskell (GHC and Cabal)

MIT License

71 stars 30 forks source link

[RFC] Statically linked binaries #31

Open chshersh opened 4 years ago

chshersh commented 4 years ago

I'm opening this issue to start a discussion around the ability to build statically linked binaries for Haskell projects. I think it will be extremely beneficial for the whole Haskell community if developers could produce such binaries easily with GitHub Actions workflows.

The following blog post describes in detail how to produce binaries for Haskell applications on all three operating systems using the setup-haskell action:

Haskell binaries release with GitHub Actions

Cabal has the --enable-executable-static flag (mentioned in the blog post about the latest changes in HLS) which allows building statically linked binaries. Still, since they are built on Ubuntu, they are not truly statically linked. I expressed my concerns in the comments under the blog post:

https://www.reddit.com/r/haskell/comments/hx0vs8/haskell_language_server_static_binaries_and/fz37tbk/

For real static binaries, you need to build them inside the Alpine-based Docker container. I've used the ghc-musl in the past, and I find it quite pleasant and easy to use, but I had to do everything manually on my laptop. It would be nice to automate this process somehow.

I'm going to mention a few people, who might be interested in this discussion (sorry for notifications, feel free to unsubscribe from this conversation):

@vrom911 (author of the GitHub binaries releases blog post)
@bubba (author of producing statically linked binaries for HLS)
@utdemir (author of the ghc-musl project)
@hasufell (author of statically linked binaries for ghcup-hs)

I would like to hear your thoughts on how we can proceed with this!

hazelweakly commented 4 years ago

Hey there! I'm really looking forward to this discussion.

For creating truly static binaries, as you pointed out, musl-c needs to be used instead of glibc. This would typically involve building with the appropriate flags inside an alpine container, or at least with a GHC compiled with musl-c.

I'm not sure how much heavy lifting actions/setup-haskell can do here since this ultimately concerns linux. macOS can't fully static link (cause reasons), and windows is an entirely separate beast. [the blog post @chshersh linked about HLS goes into these details].

As it is, there's no way that I can see of to really make a static: true option that would magically ship in a musl-c GHC, add the right compiler flags to stack and/or cabal, and do so in a cross platform way. That said, a copy-pastable example in the README would go a long way to helping to make truly static binaries easier to create.

Separate from that, there's other things that could certainly be pseudo-standardized on, such as name triples, that would make integrating various projects easier (eg it would simplify implementation of providing a tools array of various useful programs like hlint or stan that could be automatically configured). Not sure how much that really has to do with static binaries, though.

chshersh commented 4 years ago

Thanks for your feedback @jared-w!

In terms of what actions/setup-haskell can do, I was thinking about providing a flag like static: true (as you suggested), and if the OS is Linux then the command for building a Haskell project should be run from inside an Alpine docker container, and the resulting executable will be copied back to host. The setup-haskell action does a beautiful job on preparing the environment for macOS and Windows, so you don't need to think about downloading and installing proper versions of GHC, Cabal and Stack. Only Linux OS requires special treatment.

hazelweakly commented 4 years ago

I've been continuing to think about how this could work, and I really don't see how it can.

And if the OS is Linux then the command for building a Haskell project should be run from inside an Alpine docker container

In particular, this line is really asking to re-implement an entire CI pipeline worth of logic. After all, if you support a magical "build step", then almost immediately you'll run into a situation where this won't work. Most of the haskell projects I've seen would fail this, actually; either the build step is non-standard or the environment is, or... "Standard" just really doesn't mean much when it comes to software development environments.

Taking a step back, one might consider just passing appropriate flags to GHC. Really, the "only problem" with just using the appropriate flags like --enable-executable-static is that it statically links glibc. That's not the worst problem to have, so it'd certainly be feasible to just implement an output that you could append to your build command. Something like cabal build ${{ steps.setup-haskell.outputs.static-flags }} or stack build ${{ steps.setup-haskell.outputs.static-flags }}. (Having a single "magic" output that changes depending on whether or not stack: true is set seems like asking for trouble). Of course, this is longer than the actual flag (for cabal). It's an improvement for stack, but only barely. It's worth wondering about whether or not it's an improvement at all or if this is something better addressed in a README example that can be copied and pasted.

But say someone really wants to use muslc and alpine approach. The ghc-musl docker images are interesting, but they can't sanely be used inside the action as an invisible abstraction.

I suppose that's really the biggest dilemma. In order to really be successful, actions have to provide transparent abstractions. That just can't really happen with static binaries; they demand far too much knowledge of their environment to be abstracted away. It would be different if it was the default, like in rust or golang (this conversation wouldn't even be happening if that was the case).

Building static haskell binaries as part of a CI workflow could certainly be more ergonomic, but other than codifying passing in the right flags, I don't know what else could be done to simplify things from an action's programmatic point of view.

Any approach involving docker is right out and must be documented in a README that hopefully doesn't bitrot. The approach with compiler flags is a flimsy abstraction that must be kept in the back of one's mind. And maybe the simple happy path can be abstracted entirely away, but I'm unsure how much of a benefit that would even be.

So, to summarize all of that up, the only sane thing I can think of static: true doing would be to create an output static-flags that would allow people to build with something like cabal build ${{ steps.setup-haskell.outputs.static-flags }}. But I'd love to be proven wrong on that front.

That said, I think a building static haskell binaries evergreen repository that showed how to use a normal alpine docker container, the ghc-musl containers, cabal, stack, and essentially spanned the gauntlet of all the various ways to do things would be very helpful. I think another very large UX win for many haskell projects would be to work to have many of the popular CLI tools available as static binaries that can be collected and consumed through easy URLs. Lastly, another github action that sets up various CLI tools (linting, formatting, static analysis, etc) would also likely prove its worth.

utdemir commented 4 years ago

Thanks for the mention @chshersh . Here's my 2c:

I mostly agree @jared-w, where the hard part about static-compiling Haskell comes from setting up the appropriate environment, also that ghc-musl is not a good solution in this case (however, do let me know if there is anything I can do to make it easier).

However; not knowing really how this action works, take this with a grain of salt; but I do see a workable way to the key problem @jared-w pointed out:

In order to really be successful, actions have to provide transparent abstractions. That just can't really happen with static binaries; they demand far too much knowledge of their environment to be abstracted away.

This is pretty much the selling point of Nix; it is pretty good at isolating/defining software in a way that it works on any environment. So, I believe below should be doable:

Install Nix inside the action (See: https://github.com/cachix/install-nix-action)
Using Nix, drop into a shell with a musl-compiled GHC, any statically-compiled libraries necessary and and stuff like gcc and binutils. This is pretty much what ghc-musl does, the only difference is that it creates a Docker container including the results. I believe most of that logic could be adopted to work inside a GH action.
Inside that shell, use cabal-install or stack as usual with passing appropriate flags (--enable-executable-static in cabal's case, stack one is a bit more complex).

So, it wouldn't be trivial, but in the end the interface might be as simple as static: true (of course, realistically only on Linux). The most unlucky part would be to duplicating the logic of setting up the compiler and libraries using Nix.

That said, I think a building static haskell binaries evergreen repository that showed how to use a normal alpine docker container, the ghc-musl containers, cabal, stack, and essentially spanned the gauntlet of all the various ways to do things would be very helpful.

I think this is a good idea. If anyone picks it up, I would also be happy to help if I can.

chshersh commented 4 years ago

@jared-w @utdemir Thanks a lot for your feedback! I'm excited to see this issue moving forward by discussing possible ways of implementing the feature 😊

The solution with ${{ steps.setup-haskell.outputs.static-flags }} sounds good to me. If all you need is just to install the musl lib and pass proper flags to either cabal or stack for building, then the workflow that builds haskell executable can look like this (based on the command we use in @kowainik projects for producing binaries with cabal):

      - if: matrix.os == 'ubuntu-latest'
        name: Build static binary
        run: |
          mkdir dist
          sudo apt-get install -y musl
          cabal install --install-method=copy --overwrite-policy=always --installdir=dist ${{ steps.setup-haskell.outputs.static-flags }}

      - if: matrix.os != 'ubuntu-latest'
        name: Build non-static binary
        run: |
          mkdir dist
          cabal install exe:stan --install-method=copy --overwrite-policy=always --installdir=dist

Btw, where can I read about the flags I need to pass to GHC to build static binaries linked with musl? If they more or less stable across different GHC versions, then for now we can create a repo with example, and other people can just copy-paste a few several commands and options to get static binaries today! Of course, support from the official action that makes things simpler would be more convenient 🙂

In terms of reusing ghc-musl, I was thinking about using a docker image depending on GHC version. I see that ghc-musl provides containers for different GHC versions, and if this is something that will help the whole community, maybe the Haskell community will help with maintaining and creating those images.

I imagined the following workflow based on Docker containers:

Mount repo into the corresponding Docker container with the prepared environment.
Run a command to build the project inside that container (the default command can be as specified in my snippet above, but it should be possible to provide a custom command).
Copy executable from Docker container back to host.

Initially I was thinking about implementing a separate GitHub action that does exactly this. But I don't have much experience with neither Docker no TypeScript to build an action. Apparently, it's not trivial to copy files between Docker-based GitHub actions and host. I've asked similar questions in the GitHub Community:

But maybe this is a solvable problem for someone with more experience in building GitHub Actions or using TypeScript 🙂

hasufell commented 4 years ago

This is what ghcup does:

spinning up an alpine container: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/.gitlab-ci.yml#L20
installing deps: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/.gitlab/before_script/linux/alpine/install_deps.sh
building the project: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/.gitlab/script/ghcup_release.sh

As such, the ghc flags are just --ghc-options='-split-sections -optl-static'. Without split sections, you'll end up with a huge binary. Also make sure to strip it.

I believe alpine to be the easiest solution to this. You build a binary and ship it. How you built the binary doesn't have to be reproducible for 99% of the people.

Also note that ghcup supports most GHC versions on alpine (even 32bit), so you can use ghcup to install the target versions:

ghcup alpine versions

``` ✗ ghc 8.0.2 base-4.9.1.0 ✗ ghc 8.2.2 base-4.10.1.0 ✗ ghc 8.4.4 base-4.11.1.0 ✗ ghc 8.6.5 base-4.12.0.0 ✗ ghc 8.8.4 recommended,base-4.13.0.0 ✗ ghc 8.10.1 base-4.14.0.0 ✗ ghc 8.10.2 latest,base-4.14.1.0 ```

hazelweakly commented 4 years ago

Mount repo into the corresponding Docker container with the prepared environment.

Run a command to build the project inside that container (the default command can be as specified in my snippet above, but it should be possible to provide a custom command).

Copy executable from Docker container back to host.

This will run into issues unless you're very careful about exactly what directories you pass into docker. Sharing directories where build artifacts will be created inside docker (and possibly outside of docker), particularly dist-newstyle, is asking for trouble. You also have to manage all of the mounting and mount volume options, passing the right environment variables in, and duplicating github's logic in order to get the container to feel "native" to github actions (otherwise things like setting env at the job or workflow level won't work).

More importantly, cabal install and stack install have highly unintuitive behavior and you don't want to try and debug those corner cases when your local directory is bind mounted into the container but ~/.cabal/store and ~/.local/bin is not. There's a lot of weird behavior that can start cropping up when a single directory is in a different environment than everything else, but that information is hidden from the tools that work directly with the system.

It's much easier to just use a docker container from start to finish; then you avoid these problems because you're not blending different platforms together. You still have the musl vs glib issue where native code and the FFI get more difficult to work with, but at least you're not dealing with mixing abstractions in incompatible ways.

Further, Github actions has support for just using a container, so theoretically I think it's possible to have an example like.

jobs:
  static:
    runs-on: ubuntu-latest
    container: node:12 # to make sure that you can download and run actions inside the container. I think?
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-haskell@v1 # <- warning, downloads GHC, cabal, etc., from scratch *every time*.
      - ....

that would, more or less, "do the right thing". (as an aside; this means setup-haskell can be used in any container, in theory. Currently it assumes it'll only be run natively in github actions and uses those assumptions to simplify things. I think it might actually still work in a container, thanks to ghcup being so nice to use, but it's never been tested... In particular, there's a few libraries that GHC relies on that ghcup can't magically install)

Unfortunately, using a container for linux means you can no longer have a convenient 3-OS build matrix. The build matrix is really nice since github doesn't offer a lot of code de-duplication opportunities through traditional yaml shenanigans.

Nix could potentially solve this, I think, but it would be the opposite of a transparent abstraction; nix likes to be the entire solution, not just part of it. It would also destroy CI pipeline speeds; ghc's closure size is enormous in nix and that doesn't even take into account the time required to install and setup nix from scratch every time.

hasufell commented 4 years ago

Here's an example of static binary release for linux: https://github.com/hasufell/stack2cabal/blob/master/.github/workflows/release.yaml#L32

chshersh commented 4 years ago

@hasufell That's an amazing example! 😍

Does anyone know, if it's possible to define matrix with the container only for a single item? So some boilerplate can be removed. Something like:

    runs-on: ${{ matrix.os }}
    container: ${{ some variable for 'alpine:3.12' only for 'ubuntu-latest', otherwise no container }}
    strategy:
      matrix:
        os: [ubuntu-latest, macOS-latest, windows-latest]
        ...

@hasufell Another questions. Is there a difference between --ghc-options='-split-sections -optl-static' (as you did) and Cabal flags --enable-split-sections --enable-executable-static?

hazelweakly commented 4 years ago

@chshersh This is not possible, unfortunately. It's an explicit limitation of github actions that is a little annoying. More broadly, there are no top level object keys that can be optional that I'm aware of, and having values of undefined/null/falsy is almost universally an error. So even just container: ${{ includes(matrix.os, 'ubuntu') }} wouldn't work. There's an open feature request for it, but I don't know how well it'd work.

Rust seems to avoid this by allowing you to directly use muslc regardless of the host OS. It would be interesting to see if that would be a viable path for GHC/Haskell to take, but I feel a github action is the wrong level to support something of that complexity/nature. Ideally it'd be possible in a more generic fashion.

Another questions. Is there a difference between --ghc-options='-split-sections -optl-static' (as you did) and Cabal flags `--enable-split-sections --enable-executable-static?

As far as I know, they're identical.