RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.91k stars 1.99k forks source link

build for native seems to produce broken .elf on some systems #741

Closed BytesGalore closed 10 years ago

BytesGalore commented 10 years ago

I tried the current RIOT version using projects/hello-world-thread and projects/test_getpid which result in instantly crashing the executed .elf on a segmentation fault.

This applies to: GNU/Linux 3.2.0-59-generic x86 gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

GNU/Linux 3.2.0-58-generic x86_64 gcc (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) 4.7.3

mehlis commented 10 years ago

@BytesGalore please checkout current RIOT master and test with tests/test_ipc_ping_pong

the projects dir is more or less unmaintained @OlegHahm do we want to delete it to make RIOT less confusing?

BytesGalore commented 10 years ago

can't find tests/test_ipc_ping_pong in the current master, but I tested it with tests/test_shell tests/test_vtimer_msg and tests/test_float with same results.

OlegHahm commented 10 years ago

@mehlis, I would not delete it, but maybe we should remove it from the RIOT-OS organization.

Kijewski commented 10 years ago

@BytesGalore have you tried running the output file in gdb?

BytesGalore commented 10 years ago

Yes, I set a breakpoint on main (b main), but:

Starting program: /RIOT/tests/test_float/bin/native/test_float.elf [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb)

Kijewski commented 10 years ago

Could you please share the binary file with us? Right now I don't know how to reproduce the problem without installing a virtual machine. :/

Just a hint: the first entry point of a file is not main but _start. The problem will be far earlier/deeper than main.

BytesGalore commented 10 years ago

Ok, I suggest to send the binary directly to you(?) and not to the list I set the breakpoint to _start:

(gdb) b _start Haltepunkt 1 at 0x8048ca0 (gdb) start Temporärer Haltepunkt 2 at 0x804ba67 Starting program: /RIOT/tests/test_float/bin/native/test_float.elf [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x08048ca0 in _start () (gdb) step Single stepping until exit from function _start, which has no line number information. 0x001993e0 in libc_start_main () from /lib/i386-linux-gnu/libc.so.6 (gdb) step Single stepping until exit from function libc_start_main, which has no line number information.

Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb)

Kijewski commented 10 years ago

Just upload your file to some one-click-hoster I guess. Putlocker just became firedrive com and seems to be nice. (Beware: I opened the site only with Adblock enabled, and don't know if there are NSFW ads!)

BytesGalore commented 10 years ago

OK, so the test_float.elf: http://www37.zippyshare.com/v/82299910/file.html compiled on: GNU/Linux 3.2.0-59-generic x86 gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

Kijewski commented 10 years ago

Your .elf has two more includes than mine (librt.so.1 and libpthread.so.0), but that should not cause an error.

The seg fault occurs in calloc (??):

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x0804aa99 in calloc ()
#2  0xf7f5a496 in _dlerror_run (operate=operate@entry=0xf7f59e30 <dlsym_doit>, args=args@entry=0xffffcd00) at dlerror.c:141
#3  0xf7f59ed4 in __dlsym (handle=0xffffffff, name=0x804be33 "read") at dlsym.c:70
#4  0x080491fb in startup ()
#5  0x0804bb22 in __libc_csu_init ()
#6  0xf7db885a in __libc_start_main (main=0x804ba64 <main>, argc=1, ubp_av=0xffffce44, init=0x804bad0 <__libc_csu_init>, fini=0x804bb40 <__libc_csu_fini>, rtld_fini=0xf7fec0c0 <_dl_fini>, stack_end=0xffffce3c)
    at libc-start.c:235
#7  0x08048cc1 in _start ()
$ valgrind ./test_float.elf 
==8698== Memcheck, a memory error detector
==8698== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==8698== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==8698== Command: ./test_float.elf
==8698== 
==8698== Jump to the invalid address stated on the next line
==8698==    at 0x0: ???
==8698==    by 0x40B2495: _dlerror_run (dlerror.c:141)
==8698==    by 0x804BE32: ??? (in /home/kijewski/Downloads/test_float.elf)
==8698==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==8698== 
==8698== 
==8698== Process terminating with default action of signal 11 (SIGSEGV)
==8698==  Bad permissions for mapped region at address 0x0
==8698==    at 0x0: ???
==8698==    by 0x40B2495: _dlerror_run (dlerror.c:141)
==8698==    by 0x804BE32: ??? (in /home/kijewski/Downloads/test_float.elf)
==8698== Jump to the invalid address stated on the next line
==8698==    at 0x0: ???
==8698==    by 0x420E264: strerror_thread_freeres (in /lib/i386-linux-gnu/i686/cmov/libc-2.17.so)
==8698==    by 0x804AA98: calloc (in /home/kijewski/Downloads/test_float.elf)
==8698==    by 0x40B2495: _dlerror_run (dlerror.c:141)
==8698==    by 0x804BE32: ??? (in /home/kijewski/Downloads/test_float.elf)
==8698==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==8698== 
==8698== 
==8698== Process terminating with default action of signal 11 (SIGSEGV)
==8698==  Bad permissions for mapped region at address 0x0
==8698==    at 0x0: ???
==8698==    by 0x420E264: strerror_thread_freeres (in /lib/i386-linux-gnu/i686/cmov/libc-2.17.so)
==8698==    by 0x804AA98: calloc (in /home/kijewski/Downloads/test_float.elf)
==8698==    by 0x40B2495: _dlerror_run (dlerror.c:141)
==8698==    by 0x804BE32: ??? (in /home/kijewski/Downloads/test_float.elf)
==8698== 
==8698== HEAP SUMMARY:
==8698==     in use at exit: 0 bytes in 0 blocks
==8698==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==8698== 
==8698== All heap blocks were freed -- no leaks are possible
==8698== 
==8698== For counts of detected and suppressed errors, rerun with: -v
==8698== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Speicherzugriffsfehler
mehlis commented 10 years ago

@BytesGalore do you have no userpage in hamburg?! :)

as far as your binary is linked to pthread (the one from your system), and your system seems to be very old, I would think that you did bad stuff to RIOT....

https://github.com/RIOT-OS/RIOT/tree/master/examples/ipc_pingpong

btw: pthread support for applications has been merged. there is a test for it, too.

BytesGalore commented 10 years ago

I don't have a userpage (at least none I'm aware of)

I just tried to run the ipc_pingpong using the current RIOT (just cloned and totally untouched), but still the same result. I agree that the failures most probably depend on the age of the systems I used. Running the same test on a current linux works flawlessly (3.11.0-15-generic and gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9))

merged pthread is very cool and very good news :D

LudwigKnuepfer commented 10 years ago

There have been reports of similar behavior on the mailing list as well - I suspect the old libc tries to use calloc early in the process init phase somewhere. This will lead to a segfault because native does not populate its overridden malloc/calloc/... implementations until startup. I've been looking into this already because it interferes with profiling support as well.

LudwigKnuepfer commented 10 years ago

I'm going to set up a vm with the old ubuntu shortly and investigate..

kaspar030 commented 10 years ago

try vagrant...

LudwigKnuepfer commented 10 years ago

Thanks for the tip.

LudwigKnuepfer commented 10 years ago

https://github.com/RIOT-OS/RIOT/pull/762 fixes this.

BytesGalore commented 10 years ago

tested #762 (if merged this issue can be closed)