Open kalefranz opened 6 years ago
From @mithro on January 18, 2018 1:59
You can tell this is happening because you see messages like
kern.info: [13175.872406] wcgrid_oet1_vin[15737] vsyscall attempted with vsyscall=none
in your kernel log.
From @mithro on January 18, 2018 2:0
@xobs @stefanor -- See this.
From @mithro on January 18, 2018 2:1
FYI - This now being disabled by default is related to ASLR stuff I think?
From @mithro on January 18, 2018 2:3
Golang also had this issue -> https://github.com/golang/go/issues/1933
From @mithro on January 18, 2018 2:13
Debian bug - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852620
CC @mingwandroid
From @mithro on January 18, 2018 9:35
FYI -- We get the following when building -> https://github.com/timvideos/conda-hdmi2usb-packages/blob/master/lm32/gcc-nostdc/meta.yaml with the gcc_linux-64
conda compiler....
echo timestamp > s-c-target-hooks-def-h
build/genmodes > tmp-modes.c
/bin/sh /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/../move-if-change tmp-modes.c insn-modes.c
echo timestamp > s-modes
build/genmddeps /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/common.md /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/config/lm32/lm32.md > tmp-mddeps
build/genmodes -h > tmp-modes.h
/bin/sh /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/../move-if-change tmp-modes.h insn-modes.h
echo timestamp > s-modes-h
build/gengtype \
-S /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc -I gtyp-input.list -w tmp-gtype.state
/bin/sh /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/../move-if-change tmp-mddeps mddeps.mk
echo timestamp > s-mddeps
build/genenums /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/common.md /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/config/lm32/lm32.md \
> tmp-enums.c
/bin/sh /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/../move-if-change tmp-enums.c insn-enums.c
echo timestamp > s-enums
if [ xinfo = xinfo ]; then \
makeinfo --split-size=5000000 --split-size=5000000 --no-split -I . -I /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/doc \
-I /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/doc/include -o doc/gccint.info /home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/gcc/doc/gccint.texi; \
fi
Makefile:2407: recipe for target 's-gtype' failed
make[2]: *** [s-gtype] Segmentation fault
make[2]: *** Waiting for unfinished jobs....
rm gcc.pod
make[2]: Leaving directory '/home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/build-gcc/gcc'
Makefile:4094: recipe for target 'all-gcc' failed
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory '/home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/build-gcc'
Makefile:851: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
File "/home/tansell/conda/bin/conda-build", line 6, in <module>
sys.exit(conda_build.cli.main_build.main())
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 342, in main
execute(sys.argv[1:])
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 333, in execute
noverify=args.no_verify)
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/api.py", line 97, in build
need_source_download=need_source_download, config=config)
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/build.py", line 1524, in build_tree
config=config)
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/build.py", line 1147, in build
utils.check_call_env(cmd, env=env, cwd=src_dir)
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/utils.py", line 628, in check_call_env
return _func_defaulting_env_to_os_environ(subprocess.check_call, *popenargs, **kwargs)
File "/home/tansell/conda/lib/python3.6/site-packages/conda_build/utils.py", line 624, in _func_defaulting_env_to_os_environ
return func(_args, **kwargs)
File "/home/tansell/conda/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/bin/bash', '-x', '-e', '/home/tansell/conda/conda-bld/gcc-lm32-elf-nostdc_1516266603038/work/gcc-5.4.0/conda_build.sh']' returned non-zero exit status 2.
From @mingwandroid on January 18, 2018 10:32
@mithro, I do not see the value in the exclamation points and references to gasp 2011 as if it were decades ago, it misses a large part of the point of the Anaconda Distribution
. We deliberately target old glibc
versions so that the software can be run on a large range of Linux distributions. Currently our support extends back to RHEL6
/CentOS6
which uses glibc 2.12
.
Regardless, the gcc
package this bug report was originally concerning was built against CentOS5
(glibc 2.5
), is deprecated (I guess you noticed that since the bug report has morphed into something totally different now?) and was only a stop-gap package for us internally to use when we needed C++11
support. I am not going to look into rebuilding it to make sure it avoids using vsyscall
(which I guess is now a problem due to people building kernels with KPTI/PTI/KAISER or at least taking care to shut down more potential security holes).
AFAICT your make
executable segfaults here, you could probably try ours instead. Is there any reason not try to build GCC 5.5
cross compilers here?
If it is the build gengtype
that is crashing then there is a high chance you are reporting a bug in GCC 5.4
that trips up our GCC 7.2
and that's not something I am going to be able to look into for you.
For the project you are attempting to build, would crosstool-ng
not be a more appropriate starting point?
From @mithro on January 18, 2018 10:49
If you are linking against glibc 2.12
, then you are still using vsyscall
and hence will be broken on anything which disables it?
From @mithro on January 18, 2018 10:51
FYI - The gengtype is segfaulting because of;
[12478.840793] gengtype[130160] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7fff8f41af18 ax:ffffffffff600400 si:7fff8f41d763 di:7fff8f41af38
[12478.840796] gengtype[130160]: segfault at ffffffffff600400 ip ffffffffff600400 sp 00007fff8f41af18 error 15
From @mithro on January 18, 2018 10:57
The problem appears to be this -> /home/tansell/conda/pkgs/gcc_impl_linux-64-7.2.0-hc5ce805_2/x86_64-conda_cos6-linux-gnu/sysroot/usr/include/asm/vsyscall.h
From @mingwandroid on January 18, 2018 11:1
Can you try booting with vsyscall=emulate
and let me know if that works around this.
I'll probably need to track down the exact patches that disable vsyscall
in glibc
and backport them, but after that everything that uses it would need to be rebuilt and that's not a minor task!
From @mithro on January 18, 2018 11:4
Yes, vsyscall=emulate
is a work around but not available to people using managed machines nor lxss (Linux on Windows).
From @mithro on January 18, 2018 11:9
Looks like https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a77d3c17dc6517636c1cf6ab9c6bb8c257772354;hp=d53a73acdbf6ac6eb99cd06f5dd695da58d9e8f5 was the initial move in this direction...
From @mingwandroid on January 18, 2018 11:10
not available to people using managed machines
Yeah, I'm aware of that, see my previous comment:
everything that uses it would need to be rebuilt
nor lxss (Linux on Windows).
WSL/LXSS
has never been problematic for me, even in the recent past (though I only run installation and some brief tests). Which distro are you running on it?
From @mithro on January 18, 2018 11:12
Looking at https://github.com/Microsoft/WSL/issues/1462, it seems WSL has never implemented vsyscall it seems?
From @mithro on January 18, 2018 11:13
Use of vsyscall
seems pretty rare - looking at https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sysdep.h;hb=688903eb3ef01301d239ab753d309d45720610a7#l374
374 /* List of system calls which are supported as vsyscalls. */
375 # define HAVE_CLOCK_GETTIME_VSYSCALL 1
376 # define HAVE_GETTIMEOFDAY_VSYSCALL 1
377 # define HAVE_GETCPU_VSYSCALL 1
If code doesn't use any of those functions, then it will never hit this issue...
From @mithro on January 18, 2018 11:16
Hrm.... It looks like time()
uses this, so I would expect it to be more common?
(gdb) bt
#0 0xffffffffff600400 in ?? ()
#1 0x0000000000435d6d in time ()
#2 0x00000000004118e8 in write_state(char const*) ()
#3 0x000000000040154c in main ()
From @mingwandroid on January 18, 2018 11:16
Unfortunately code does use those functions! I will write something to scan all packages and list the ones to be rebuilt. Thanks for your help here.
From @mingwandroid on January 18, 2018 11:17
Yeah, gettimeofday()
was moved into the vsyscall
table specifically due to how much use it got (infact that's true of all things in there).
From @mithro on January 18, 2018 11:18
I'm having trouble finding where the gcc_linux-64
conda package is generated?
From @mingwandroid on January 18, 2018 11:19
Here: https://github.com/AnacondaRecipes/aggregate/tree/master/ctng-compilers-feedstock/recipe
From @mithro on January 18, 2018 11:51
I think you can just patch your glibc to not have HAVE_CLOCK_GETTIME_VSYSCALL
~/conda/conda-bld/compilers_linux-64_1516275741337/work/.build/src/glibc-2.12.2/sysdeps/unix/sysv/linux/x86_64/sysdep.h:# define HAVE_CLOCK_GETTIME_VSYSCALL 1
?
From @mingwandroid on January 18, 2018 11:56
I don't think this issue is as widespread or as simple as that.
I compiled and ran this test on SUSE WSL just now:
#include <stdio.h>
#include <time.h>
int main(int argc, char **argv)
{
int result;
clockid_t clk_id;
struct timespec tp;
result = clock_gettime(CLOCK_MONOTONIC, &tp);
printf("result: %d\n", result);
printf("tp.tv_sec: %lld\n", tp.tv_sec);
printf("tp.tv_nsec: %lld\n", tp.tv_nsec);
time_t now = 0;
time(&now);
printf("time(&now): %lld\n", (long long)now);
return 0;
}
.. and it runs just fine (you need to link to -lrt
though).
I need to see the compilation line for gengtype
I think.
From @mingwandroid on January 18, 2018 12:0
OK, the trigger is to add -static
to the compilation flags, in which case time(&now)
will segfault.
From @mingwandroid on January 18, 2018 12:6
From my perspective, this is very good, as it is unlikely we need to rebuild very much software (if any at all, except maybe our compilers once I figure out a way around this).
It does mean that this issue has suddenly dropped dramatically very far down my priorities list though. I've pointed you to the recipe for the compilers, if you wish to take a go at forking diorcety crosstool-ng
(linux-target-2
branch) and backporting these fixes I'll be happy to review them.
https://github.com/AnacondaRecipes/aggregate/blob/master/crosstool-ng-feedstock/recipe/meta.yaml
and:
From @mithro on January 18, 2018 12:6
The problem seems to be Retrieve time symbol from vDSO. Substitute with vsyscall if not available.
feature...
From @mithro on January 18, 2018 12:7
vDSO is not available when you statically link.
From @mingwandroid on January 18, 2018 12:7
vDSO should be mapped into static executables.
From @mingwandroid on January 18, 2018 12:8
See: https://github.com/golang/go/issues/1933#issuecomment-66057030
From @mithro on January 18, 2018 12:10
gcc -static uses the vsyscalls. Ulrich tells people not to use statically-linked glibc.
I'll see if the kernel folks want to stick a liberally-licensed example implementation
of how to find stuff in the vDSO in the kernel tree somewhere.
...and don't use my code -- it's totally wrong. I misunderstood the purpose of section
headers and program headers.
From @mingwandroid on January 18, 2018 12:22
As a test, I replaced our libc.a with the one from openSUSE Leap 42.3 WSL and this problem goes away so I guess since Ulrich stopped working on glibc someone else must have decided it would be nice to fix this.
From @mingwandroid on January 18, 2018 12:25
I changed the title to something less scary.
From @mithro on January 18, 2018 12:26
I'm currently building your conda_build_config.cos6.x86_64.yaml
with glibc 2.14.1
to see if that fixes the problem to see if that works...
From @mingwandroid on January 18, 2018 12:26
Good idea! I'm impressed how quickly you got up to speed with that.
Got an error on copying, and the copy didn't complete. Going to close this issue and try one more time.
Guess the mover got rate-limited ...
I'll copy the rest of the comments manually.
From @mithro
I somehow have a nack for hitting these types of problems, so I've gotten pretty good at navigating things I don't quite understand :-P
I'm also trying to figure out why we are linking our toolchains with -static
-- It may have only been needed before we started using the conda provided compilers....
From @mingwandroid
Well, I used to use -static
via crosstool-ng
when I first started looking into taking this approach for building the conda compilers, but it bakes the syscall table order into the executable which limits the compatibility quite badly (though I guess vDSO should mitigate that and this may only be true for vsyscall
usage anyway?).
There's two meanings of static executable these days, fully static and mostly static. Our compilers are mostly static, in that they do link to glibc dynamically, but all of their deps are statically linked. This is probably the more useful variant.
While I think it'd be great to fix this and so do not want to discourage you, I do feel crosstool-ng is more suited to building cross-toolchains than anything else around (it's what I use - clearly - and it's its only raison d'être).
But please, proceed!
From @mingwandroid
Minimal reproducer; on a system with gdb installed, from an activated conda env which contains gcc_linux-64:
echo "#include <time.h>" > time.c
echo "int main() { time_t now; time(&now); }" >> time.c
$CC time.c -static && gdb --batch -q -n -ex "run" -ex "bt" a.out
.. should output something like:
Program received signal SIGSEGV, Segmentation fault.
0xffffffffff600400 in ?? ()
#0 0xffffffffff600400 in ?? ()
#1 0x00000000004099ad in time ()
#2 0x00000000004006a7 in main ()
From @mingwandroid
Another useful reference: https://sourceware.org/bugzilla/show_bug.cgi?id=12813
@kalefranz, can you move this to anaconda-issues
please?
I believe I have a crosstool-ng which patches vsyscall support out of glibc used by conda.
I'm in the process of doing some testing now.
Is there a way to test this more properly?
I'd write a C file that exercises each of the functions that could previously have been in the vsyscall area and make sure they all work.
But mostly my concern is that this change doesn't break non-static in any way and that'll require trying to build a whole load of packages and seeing that they work properly.
Did this patching working for you @mithro? Am curious as it seems the wheel folks are going the same direction.
What's the impact we are expecting here, @mingwandroid? Sounds like minimal if at all. Trying to understand if we should be factoring this into a conda-forge rebuild or not (given upgrading compilers will require a rebuild anyways).
Continuum Analytics faces a related problem with its conda software suite, and as they point out, this will pose a significant obstacle to using these tools in hosted services
This is incorrect. We build basically 0 static executables, never have, never will (except possibly an uncompressor in the future, but we'll make sure it doesn't run afowl of this issue).
There's nothing to be done here IMHO, though if @mithro supplies the patches I can think about adding them to our compilers for people who prefer static.
Edit: for tone. They patches are available on the URL too. I don't object to applying them and rebuilding but it isn't very important at all.
I'm afraid I ran out of time before I was able to finish the back porting of the vsyscall removal changes. You can see where I got to here -> https://github.com/diorcety/crosstool-ng/compare/master...mithro:vsyscall-removal
You can also see my changes to the AnacondaRecipe for crosstool-ng here -> https://github.com/AnacondaRecipes/aggregate/compare/master...mithro:vsyscall-work
It adds a test which checks this problem, so it might be worth just incorporating that test by itself to track any progress here.
It will probably be quite a long time before I get back to this.
From @mithro on January 18, 2018 1:55
The conda gcc package (and probably multiple other packages) segfault when used on newer Debian and when running under Linux on Windows emulation.
This is because the conda gcc package is linked against a very old version of glibc (older than 2.14 which was released in 2011!). This old version of glibc uses the "vsyscall" functionality which is not available on new Linux distributions and on Linux on Windows.
vsyscall was deprecated in Linux in 2011 (!!). Debian has now disable emulation in the kernel and this causes all apps to crash when started. It is also considered insecure to have it enabled.
Newer glibc versions (2.14) have removed the usage of this functionality.
Copied from original issue: conda/conda#6747