Igalia / snabb

Snabb Switch: Fast open source packet processing
Apache License 2.0
48 stars 5 forks source link

Binaries built on Guix don't run on Ubuntu #49

Closed wingo closed 7 years ago

wingo commented 9 years ago

I built a snabb-lwaftr binary and it didn't run on Ubuntu. I think it's because it specifies the interpreter explicitly:

wingo@interlaken:~$ ldd snabb-lwaftr 
    linux-vdso.so.1 =>  (0x00007fffeb1f8000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fefec900000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fefec53c000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fefec338000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fefec032000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fefebe14000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fefebbfd000)
    /gnu/store/w29667jfv02s1hgmv0yp7nqyywvdv1fz-glibc-2.21/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fefecb08000)

For now to distribute binaries I'll build on Debian but we should probably figure out a solution. We could statically link in libc and everything; perhaps that would give us a more consistent static address space layout anyway...

wingo commented 9 years ago

Cc to @lukego who might have opinions about what he would accept in Snabb proper; not urgent though.

dpino commented 9 years ago

I run into a similar issue when I tried to use a previously Ubuntu built QEMU in chur. The reason was the explicit path to the ELF interpreter as you pointed out.

This is what I got for a snabb binary in chur:

patchelf --print-interpreter ./snabb 
/nix/store/npfsi1d9ka8zwnxzn3sr08hbwvpapyk7-glibc-2.21/lib/ld-linux-x86-64.so.2

If I move it to my local desktop (Ubuntu), I cannot run it:

bash: ./snabb: No such file or directory

Very misleading message. It's possible to run the binary by explictly invoking the interpreter:

/lib64/ld-linux-x86-64.so.2 ./snabb

Or better by patching the path to the ELF interpreter:

patchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 ./snabb

After that the binary works in Ubuntu. I don't know if it's a better way of fixing this issue.

lukego commented 9 years ago

I lean towards using a Dockerfile to build binary releases. That way we have a controlled build environment e.g. a semi-ancient Debian. What do you think?

btw I have added Snabb Switch to NixOS (nixos/nixpkgs#10272) and am currently cooperating with @domenkozar to get the full-blown OpenStack with Snabb NFV added too (nixos/nixpkgs#10399). This is intended to make the complete test environment easy to deploy in a reproducible way.

domenkozar commented 9 years ago

We could distribute Nix package to ubuntu (with either sudo install or rewriting nix paths). Another option is as @dpino said, use patchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 ./snabb to set a different interpreter.

Third option and I think the best one, would be to use Nix to build .deb/.rpm as we do for Nix itself ready: http://hydra.nixos.org/eval/1227834

wingo commented 8 years ago

Sadly this works in the opposite direction too -- binaries build on FHS systems don't run on Nix/Guix, as Nix/Guix don't have /lib64/ld-linux-x86-64.so.2. It kinda cramps our "we'll release source of course but also a self-contained binary you can just run anywhere" message. I'll take a brief look into building a static binary and see what happens -- we don't really need the dynamic linker anyway AFAIU.

domenkozar commented 8 years ago

@andywingo you can use patchelf on Nix.

Run ldd $(which cat) to get a link to interpreter, then

patchelf --set-interpreter /nix/store/q0m70q5wg21hxrixkp1xk4x95sfs2fln-glibc-2.21/lib/ld-linux-x86-64.so.2 ./mybinary

Of course, it's better to build this from Nix without hardcoding interpreter that might be GCed.

wingo commented 8 years ago

Thanks @domenkozar :) I will probably end up just doing what you suggest. My quick fumblings with various compilation options failed to make a fully statically linked binary; ah well.

lukego commented 8 years ago

@andywingo The last time I dug into static linking the conclusions I came to were:

  1. Static linking with glibc is not supported anymore.
  2. musl does work with static linking.
  3. musl is unsuitable for some other reason (I'd have to check my twitter stream for where I came to that conclusion in a conversation with @justincormack). I am not sure if this was a fundamental reason or not.
  4. The LSB tools from the Linux foundation are rotten and useless.

The most practical solution I came to was found on blogs of people writing Linux-based games that they want to be easy to install. That suggestion was to statically link everything except glibc and to make sure your binary doesn't demand to link with symbol versions that are only present in recent glibc (achieved either by compiling with an old glibc or with the hack in snabbswitch/Makefile and sanity-check in snabbswitch/src/selftest.sh).

lukego commented 8 years ago

(You would still need the patchelf trick or e.g. to build in docker with an old version of debian.)

I agree that this disrupts the "our binary works on any Linux machine" message. I have accepted this because it seems like people running NixOS/Guix probably expect a certain amount of unusual issues when installing binaries built on other systems.

Could be interesting to have another look at musl and full static linking some time!

justincormack commented 8 years ago

The reasons for it not building on Musl libc were fairly minor from memory, and I do plan to look at them again soon.

lukego commented 8 years ago

The big problem I encountered seems to be that musl does not support dlopen() and so LuaJIT's ffi.load() does not work. This is a problem for two reasons: that's a handy feature to have available to snsh scripts, and that is also how we have been dealing with awkward dependencies that we aren't willing to bring into the build but also don't want to completely exclude. This included libpcap in the past and now it covers at least the Solarflare vendor-library.

This may be acceptable for many programs but not for all e.g. snsh.

justincormack commented 8 years ago

Musl does support dlopen, but not with static linking. At some point this might actually change, it has been discussed.

lukego commented 8 years ago

Interesting.

The other issue that comes to mind is that one of the reasons we don't currently have a custom memcpy in Snabb Switch is that the glibc one has been competitive in benchmarks. This would need to be rechecked if we switch to musl to avoid performance regressions. However, having a good excuse to write our own memcpy for packets would not be the end of the world either.

justincormack commented 8 years ago

I think it depends on whether we can include alignment constraints; also luajit has its own memcpy which we could use (it uses it eg for copying structs), could probably get it inlined by using struct assignment.

See also http://www.openwall.com/lists/musl/2012/07/30/4

lukego commented 8 years ago

... you know if we did use static linking and no dlopen() it would potentially be inconvenient in the right way i.e. forcing us to do the right thing and build a truly self-contained executable without any "off the books" dependencies. If we could live with this then it could potentially be a good thing.

Programs that really do need glibc and/or dlopen() could be built accordingly with their own make target e.g. make bin/snsh.

Something to ponder...

dpino commented 7 years ago

I think this ticket can be closed now as the main issue was to produce FHS compatible tarballs for every release. #733 defines a Hydra job that produces a FHS release for every commit. Tarballs are available at:

https://hydra.snabb.co/jobset/igalia/snabb-lwaftr-release