NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.13k forks source link

erlang: Build error caused by unrelated changes to `buildRubyGem` #36853

Closed the-kenny closed 6 years ago

the-kenny commented 6 years ago

Issue description

nixpkgs.rebar fails to build with a cryptic error message, seemingly caused by a totally unrelated change to buildRubyGem in fced35fa44098be0296d8b42166583bd5e505141:

% nix build nixpkgs.rebar
builder for '/nix/store/xpd08z3q4ws5rcsgm0jdz3a7xvhl4lah-rebar-2.5.1.drv' failed with exit code 1; last 10
log lines:
  source root is rebar-2.5.1
  setting SOURCE_DATE_EPOCH to timestamp 1406741869 of file rebar-2.5.1/test/upgrade_project/rel/reltool.config
  patching sources
  configuring
  no configure script, doing nothing
  building
  {"init terminating in do_boot",{'cannot get bootfile','start_clean.boot'}}
  init terminating in do_boot ()

git-bisect points to fced35fa44098be0296d8b42166583bd5e505141 as the cause which seems unrelated. However, reverting this commit fixes the build.

@aneeshusa investigated this some more in https://github.com/NixOS/nixpkgs/commit/fced35fa44098be0296d8b42166583bd5e505141#commitcomment-27975737 - here is a copy:

@aneeshusa's investigation:

@the-kenny I did some digging but didn't find anything concrete. I was able to reproduce your git bisect result. Looking at the rebar -> ruby dependency chain, this is what I see:

rebar3 -> erlang -> wxwidgets -> gtk+ -> cups -> systemd -> libidn2 -> ronn -> ronn-gems -> all the ruby bits

All of these build except the final rebar3, and ronn is a manual building tool, so I'm not really sure how the ruby change broke rebar3. I also tried a couple of other erlang releases (R19, etc.) which seemed to get past the error the R20 release encountered very early on.

I also did nix-shell --pure -A beam.packages.erlangR20.rebar and didn't see anything ruby or ronn related. I do have sandboxing turned on.

I added a trace to this file to print out the gemName on each instantation; this is the output when instantiating R20 rebar:

trace: ronn
trace: hpricot
trace: mustache
trace: rdiscount
trace: bundler
trace: bundler

This is also the same trace for -A ronn, so these all seem strictly ronn-related.

My only wild remaining guesses are:

Hopefully this helps a bit.

Steps to reproduce

nix build nixpkgs.rebar git revert fced35fa44098be0296d8b42166583bd5e505141 nix build nixpkgs.rebar

Technical details

% nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 4.9.86, NixOS, 18.09.git.8f9e814 (Impala)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.0`
 - channels(root): `""`
 - channels(moritz): `"nixos-16.03-16.03.1242.c5cbda2"`
 - nixpkgs: `/home/moritz/nixos-configurations/x260/nixpkgs` (a9e1ae3d2981514616596f52c510a3c9bc035cb3)

Similar Issues

36823 looks similar, but is caused by another (also seemingly unrelated) commit.

the-kenny commented 6 years ago

Note that master looks fixed for erlangR20 only. nix-build . -A beam.packages.erlang.rebar still fails.

On release-18.03 both erlangR20.rebar and erlang.rebar fail.

the-kenny commented 6 years ago

@dtzWill This seems somehow related the busybox changes in #36919 - any idea how these can degrade the stability of the erlang builds?

kamilchm commented 6 years ago

I found that there were missing boot script in the erlang bin after installing. I don't know what the root case is, but this workaround resolved it for me:

diff --git a/pkgs/development/interpreters/erlang/generic-builder.nix b/pkgs/development/interpreters/erlang/generic-builder.nix
index 1d2b79074fb..e337c8a4041 100644
--- a/pkgs/development/interpreters/erlang/generic-builder.nix
+++ b/pkgs/development/interpreters/erlang/generic-builder.nix
@@ -91,6 +91,8 @@ in stdenv.mkDerivation ({
     ${postInstall}

     ln -s $out/lib/erlang/lib/erl_interface*/bin/erl_call $out/bin/erl_call
+
+    cp $out/lib/erlang/releases/*/start_*.boot $out/lib/erlang/bin/
   '';

   # Some erlang bin/ scripts run sed and awk
dtzWill commented 6 years ago

I tried using the cdv script (ended up using one from 17.09's erlangR20) and it too complained about missing "start.boot" or so. But... why did this start happening? :(

dtzWill commented 6 years ago

Bit more info, but still investigating:

the-kenny commented 6 years ago

So we're most likely looking at some sort of impurity here? If so, the next step would be investigating what Erlang in the installPhase where it's supposed to copy start_*.boot as manually copying seems to work fine.

dtzWill commented 6 years ago

Alright so I got it. Short version: it's builders that haven't updated to use fixed /bin/sh. Good call folks :).

Longer version: Looking at success/fail builds on Hydra, the corresponding erlang builds used are these:

diff'ing with:

$ diff -u <(nix log /nix/store/pxxiimf801v5hf3fxf5k12ygf54p1z28-erlang-19.3.6.4) <(nix log /nix/store/m8ylg6j3hb8npqc908hqavw53mhr338v-erlang-19.3.6.4)

Produces this: https://gist.github.com/dtzWill/d4cad2e4f8087699383f169e5681fdaa

In particular:

https://gist.github.com/dtzWill/d4cad2e4f8087699383f169e5681fdaa#file-good-vs-bad-diff-L114-L124

Lacking support for "command" corresponds to needing to update to use fixed sh :).

the-kenny commented 6 years ago

Could it be that the following simple fix is enough? This just makes sure that we run patchShebangs before we run the rest of postPatch (also links run_erl before running the rest of postInstall).

diff --git a/pkgs/development/interpreters/erlang/generic-builder.nix b/pkgs/development/interpreters/erlang/generic-builder.nix
index 1d2b79074fb..6ea3ac73a4b 100644
--- a/pkgs/development/interpreters/erlang/generic-builder.nix
+++ b/pkgs/development/interpreters/erlang/generic-builder.nix
@@ -65,9 +65,9 @@ in stdenv.mkDerivation ({
   '';

   postPatch = ''
-    ${postPatch}
-
     patchShebangs make
+
+    ${postPatch}
   '';

   preConfigure = ''
@@ -88,9 +88,9 @@ in stdenv.mkDerivation ({
   # (PDFs are generated only when fop is available).

   postInstall = ''
-    ${postInstall}
-
     ln -s $out/lib/erlang/lib/erl_interface*/bin/erl_call $out/bin/erl_call
+
+    ${postInstall}
   '';

   # Some erlang bin/ scripts run sed and awk`
nlewo commented 6 years ago

Note the above patch also fixes https://github.com/NixOS/nixpkgs/issues/37638.

dtzWill commented 6 years ago

To see if this fixes things you need to do so on a builder that fails as-is. Yours shouldn't, mine doesn't, etc.-- right now anything that convinces Nix to build (on your builder) is enough to fix it.

Does your proposed change help a broken builder? If so that'd be great!

the-kenny commented 6 years ago

Just pushed 3e61f3b911c to master. Now waiting for Hydra (& reports in #36823 and #37638).

If it works out fine we should cherry-pick it to release-18.03.

pbogdan commented 6 years ago

I can still produce non-functioning Erlang builds with those changes applied. As @kamilchm points out it seems to come down to few missing .boot files. An example difference between a good and a bad build:

--- tree1       2018-03-24 14:12:40.295145433 +0000
+++ tree2       2018-03-24 14:12:51.304084842 +0000
@@ -5,7 +5,7 @@
 │   ├── epmd -> ../lib/erlang/bin/epmd                                                                                                                                                                             
 │   ├── erl -> ../lib/erlang/bin/erl                                                                                                                                                                               
 │   ├── erlc -> ../lib/erlang/bin/erlc                                                                                                                                                                             
-│   ├── erl_call -> /nix/store/f0ncniw2g22k381ba74b8yp22j115skp-erlang-19.3.6.4/lib/erlang/lib/erl_interface-3.9.3/bin/erl_call                                                                                    
+│   ├── erl_call -> /nix/store/9llmw5m5mp1n524mq75dph1hlrgy56vs-erlang-19.3.6.4/lib/erlang/lib/erl_interface-3.9.3/bin/erl_call                                                                                    
 │   ├── escript -> ../lib/erlang/bin/escript                                                                                                                                                                       
 │   ├── run_erl -> ../lib/erlang/bin/run_erl                                                                                                                                                                       
 │   ├── to_erl -> ../lib/erlang/bin/to_erl                                                                                                                                                                         
@@ -22,10 +22,7 @@
 │       │   ├── no_dot_erlang.boot                                                                                                                                                                                 
 │       │   ├── run_erl                                                                                                                                                                                            
 │       │   ├── start                                                                                                                                                                                              
-│       │   ├── start.boot                                                                                                                                                                                         
-│       │   ├── start_clean.boot                                                                                                                                                                                   
 │       │   ├── start_erl                                                                                                                                                                                          
-│       │   ├── start_sasl.boot                                                                                                                                                                                    
 │       │   ├── start.script                                                                                                                                                                                       
 │       │   ├── to_erl                                                                                                                                                                                             
 │       │   └── typer                                                                                                                                                                                              
@@ -7724,4 +7721,4 @@
 └── nix-support                                                                                                                                                                                                    
     └── setup-hook                                                                                                                                                                                                 

-559 directories, 7165 files
+559 directories, 7162 files

Both store paths should be cached on Hydra and can be retrieved with nix-store -r /nix/store/{f0ncniw2g22k381ba74b8yp22j115skp-erlang-19.3.6.4,9llmw5m5mp1n524mq75dph1hlrgy56vs-erlang-19.3.6.4}. I can't find the build links right now but IIRC they were produced by different builders.

The missing boot files seem to be due to what's observed in https://github.com/NixOS/nixpkgs/issues/36823#issuecomment-372294868 - in short in my setup sandbox /bin/sh can't expand certain globs, those globs are needed to copy the .boot files. As I understand it some Hydra builders might still have this issue (EC2 builders specifically is my own personal speculation).

Locally applying

diff --git a/pkgs/development/interpreters/erlang/generic-builder.nix b/pkgs/development/interpreters/erlang/generic-builder.nix
index 6ea3ac73a4b..cf8fe1f6e56 100644
--- a/pkgs/development/interpreters/erlang/generic-builder.nix
+++ b/pkgs/development/interpreters/erlang/generic-builder.nix
@@ -68,6 +68,9 @@ in stdenv.mkDerivation ({
     patchShebangs make

     ${postPatch}
+
+    substituteInPlace erts/etc/unix/Install.src  \
+      --replace "#!/bin/sh" "${stdenv.shell}"
   '';

   preConfigure = ''

seems to produce more friendly Erlang (R19) that can build rebar, couchdb, and rabbitmq_server. I can't rebuild all Erlang things right now.

pbogdan commented 6 years ago

Hmm, something is wrong still :-(, elixir won't build..

pbogdan commented 6 years ago

runs patchShebangs over elixir source ...

Yep, that did it.

pbogdan commented 6 years ago

Looks like it's specifically bin/elixir that needs patching (similar globbing issue it seems). Everything else seems to build fine.