Closed rjloura closed 5 years ago
Workaround for this:
$ sudo pkgin install libsodium
$ SODIUM_INSTALL=system pip install pynacl
$ pip install manta
It seems this may be more pernicious than originally thought. After successfully installing via pip install manta
, I see this error:
[root@standard ~]# mantash ls /rjloura/stor
Traceback (most recent call last):
File "/opt/local/bin/mantash", line 29, in <module>
import manta
File "/opt/local/lib/python2.7/site-packages/manta/__init__.py", line 7, in <module>
from .auth import PrivateKeySigner, SSHAgentSigner, CLISigner
File "/opt/local/lib/python2.7/site-packages/manta/auth.py", line 23, in <module>
from paramiko import Agent
File "/opt/local/lib/python2.7/site-packages/paramiko/__init__.py", line 22, in <module>
from paramiko.transport import SecurityOptions, Transport
File "/opt/local/lib/python2.7/site-packages/paramiko/transport.py", line 90, in <module>
from paramiko.ed25519key import Ed25519Key
File "/opt/local/lib/python2.7/site-packages/paramiko/ed25519key.py", line 22, in <module>
import nacl.signing
File "/opt/local/lib/python2.7/site-packages/nacl/signing.py", line 17, in <module>
import nacl.bindings
File "/opt/local/lib/python2.7/site-packages/nacl/bindings/__init__.py", line 17, in <module>
from nacl.bindings.crypto_aead import (
File "/opt/local/lib/python2.7/site-packages/nacl/bindings/crypto_aead.py", line 18, in <module>
from nacl._sodium import ffi, lib
ImportError: ld.so.1: python2.7: fatal: relocation error: file /opt/local/lib/python2.7/site-packages/nacl/_sodium.so: symbol sodium_unpad: referenced symbol not found
I tried installing pynacl with and without the SODIUM_INSTALL=system
variable set and had the same issue. Interestingly install completes without the variable as it did not before.
[root@standard ~]# nm /opt/local/lib/python2.7/site-packages/nacl/_sodium.so | grep sodium_unpad
00000000000125d0 t _cffi_d_sodium_unpad
00000000000125e0 t _cffi_f_sodium_unpad
U sodium_unpad
@rjloura Make sure I understand. Current python-manta is basically broken on SmartOS?
As I understand it, the build fails on SmartOS (I tried on 17.4.0) when pip installing pynacl, which in term builds libsodium. I see that pkgsrc has libsodium-1.0.16 so presumably the tests passed when pkgsrc built it. If I download the 1.0.16 tarball from https://download.libsodium.org/libsodium/releases/, make check
also passes. So I'm confused what is different about the environment when pip is building libsodium.
Regarding SODIUM_INSTALL
, maybe pyca/pynacl#497 is related?
For the main breakage, recall that pynacl is a wrapper around libsodium. Above I pointed out that if you just download the libsodium it appear to work, so it was unclear what was different when pip did it's thing. I think the answer is https://github.com/pyca/pynacl/blob/master/setup.py#L161:
# Run ./configure
subprocess.check_call(
[
configure, "--disable-shared", "--enable-static",
"--disable-debug", "--disable-dependency-tracking",
"--with-pic", "--prefix", os.path.abspath(self.build_clib),
],
cwd=build_temp,
If I run that on the same system I was testing with, then make check
does fail with
../../build-aux/test-driver: line 107: 142284: Memory fault(coredump)
FAIL: randombytes
From some trial and error the combination of both --disable-shared
and --with-pic
is needed to trigger the crash.
The steps to reproduce are along the line of:
pkgin in build-essential
wget https://download.libsodium.org/libsodium/releases/libsodium-1.0.16.tar.gz
tar xvfz libsodium-1.0.16.tar.gz
cd libsodium-1.0.16
./configure --disable-shared --with-pic && make check
I tried it on a few different JPC instances and got:
pkgsrc | kernel | result |
---|---|---|
2018Q3-x86_64 | 20180816T001857Z | CRASH |
2017Q4-x86_64 | 20180813T212436Z | CRASH |
2014Q4-x86_64 | 20180816T001857Z | no crash |
I'm not sure if this points to a bug in the kernel or something provided by pkgsrc, or what the right place is to look next (cc @jperkin).
The problem seems to be that thread-local variables from static libraries are not properly initialized in SmartOS.
randombytes_salsa20_random.c
declares a thread-local variable:
static TLS Salsa20Random stream = { ... };
(On SmartOS, TLS
is a macro for_Thread_local
).
When the library is statically linked, stream
is not properly initialized, and accessing the structure's fields leads to undefined behavior.
A workaround is to compile the library without threads --without-threads
, which is probably acceptable for the Python bindings.
Just for the record - in case of libsodium-1.0.16 and libsodium-1.0.17 --without-threads
doesn't help. The library compiles only if --enable-shared
is requested.
The compiler outputs an R_386_TLS_LDM
relocation for stream
at randombytes_salsa20_random_stir+0x7e
which the link-editor is processing faithfully. We'd expect to see an R_386_PLT32
or R_386_TLS_LDM_PLT
after this (at +0x84
), to either fix up a call __tls_get_addr
or to nop it out. The link-editor is doing neither, and leaving a dangling call
with arbitrarily bad results.
At first glance, it looks like .rel.text
contains the entries we'd hoped for, but that ld(1)
is ignoring one of them.
I think this is because we expect the "main" TLS relocation to have dealt with this for us, as we would in the case of R_386_TLS_GD
, which would indeed fix up the call
. the LDM
relocations don't do this, and it seems likely that they must. I think that would still be valid if we later see the LDM_PLT
relocation, we'll just overwrite text we already overwrote (and, actually, then overwrite it yet again, later).
If I can trust the randombytes
test that @jasonbking showed me, a diff similar to this one:
https://gist.github.com/richlowe/33f189f0f71ffe6e0aa48b01faccfe9b
Applied to the illumos link-editor, makes things work. What this does is transition the 'call' as we would have with a _GD
relocation, leaving behind an _LE
relocation to fix up the addl
.
Running randombytes
after this
; TERM=ansi mdb +o pager ./randombytes
> randombytes_salsa20_random_stir:b
> ::run
mdb: stop at randombytes_salsa20_random_stir
mdb: target stopped at:
randombytes_salsa20_random_stir:pushl %ebp
> 1::tls stream
fe533210
> fe533210/KKC
0xfe533210: 0 0 \0
> ::step out
mdb: target stopped at:
randombytes_stir+0x26: popl %eax
> fe533210/KKC
0xfe533210: 1 0 \266
> $q
Looks reasonable for correctness based on my very limited understanding of the code.
A more adequate check for correctness -- adding CTF to the salsa20 rdrand object and checking the hrtime at the end of the struct, also looks ok.
I don't have a build environment that allows me to build a broken system (I'm using an object from @jasonbking) so I can't reasonably test this further. I'd appreciate if somebody who could, would rebuild libld.so.4
using the above patch, and see if everything also seems good to them.
I was able to recreate the problem (and sent @richlowe the resulting binary). Using an ld with his patch, I do not see the failure anymore.
I was able to recreate the problem (and sent @richlowe the resulting binary). Using an ld with his patch, I do not see the failure anymore.
What were the steps to test the patch? Is it possible to update ld only in one particular zone without touching GZ? Is there any plan to include the patch in the newest release? Thanks!
The local dynamic TLS fix was integrated in commit 096c97d62be876a03a0a8cdb0a540e9c84ec509f which was merged into illumos-joyent this morning. If you build your own SmartOS-live image, you can try that. Otherwise the fix should appear in the release that should be out around Feb 13th.
Just tried to build on: SunOS ikara 5.11 joyent_20190214T002809Z i86pc i386 i86pc Solaris The issue persists. Steps to reproduce - tried on c193a558-1d63-11e9-97cf-97bb3ee5c14f base-64-lts 18.4.0 smartos zone-dataset 2019-01-21: $ sudo pkgin install python37 py37-pip gcc7{,-libs} gmake $ python3 -m venv test && source test/bin/activate $ python3 -m pip install --upgrade pip $ python3 -m pip install ansible If needed I can attach the core file.
Unfortunately, this appears to be a different bug. The earlier bug was something that was just present in 32-bit.
This is indeed a similar but different issue. The TLS transition related corruption here causes any write to members of stream
to actually write to the beginning of stream
, thus radically corrupting rnd32_outleft
several times in randombytes_salsa20_random_stir
The amd64 bug is occurring because the link-editor erroneously 0's the addend when transitioning from LD (dtpoff) to LE (tpoff), and thus the structure offsets emitted by the compiler in the form
leaq 48+stream@dtpoff(%rax), %rdi
to get at rnd32
, effectively disappear.
Thanks for your comments and interest. Should I open another issue to track that?
I filed illumos bug: #10471 "ld(1) amd64 LD->LE TLS transition causes memory corruption" (https://www.illumos.org/issues/10471) to track the fix.
I don't know whether the smartos folks would like a second bug filed or not.
This bug for 64-bit builds was filed as illumos#10471
Since we merge w/ illumos-gate M-F, once it lands in illumos-gate, it should land in illumos-joyent shortly thereafter (and then in the following bi-weekly release).
Okay. Thanks, will wait patiently.
I tried the above commands on the latest release (20190328T010321Z), and pynacl as well as ansible in general both build without an error now. I suspect the previous release will also work (I believe it also has the fix), though I didn't have a chance to try to build on that.
Hi @jasonbking. I confirm. Works like a charm. Thanks a lot!
Since the issue is resolved, I'm going to go ahead and close this now.
There is an issue with running the test suite for the libsodium that is bundled with pynacl. So during an install of python-manta on SmartOS you will see a build failure that looks something like this: