Closed jmealo closed 10 years ago
Might be a Lua/threads issue?
What happens if you run some of the cucumber tests? See https://github.com/DennisOSRM/Project-OSRM/wiki/Cucumber-Test-Suite. For example "cucumber -t @smallest" to run a simple scenario.
One possible direction is to a) use the latest source from develop build and b) to run the binaries under GDB (or LLDB) and look at the stack trace when it crashes. To get reasonable output, compile debug binaries by adding -DCMAKE_BUILD_TYPE=1
to the cmake call.
Thanks for the quick replies!
@emiltin I'm unable to run the tests as sysctl.h isn't available so I can't install sys-proctable (gem install sys-proctable --platform sunos doesn't even work). Not sure if there's a way to get around this.
@DennisOSRM I'm giving that a try now.
Just a quick dtrace with the -DCMAKE_BUILD_TYPE=1 reveals different patterns for --threads=1 and 8 threads (autodetected).
8 threads:
14 75819 _ZN20ScriptingEnvironmentC2EPKc._omp_fn.0:entry
1 78632 lj_meta_tset:entry
10 78270 lj_alloc_f:entry
1 78613 lj_tab_get:entry
10 78265 lj_alloc_malloc:entry
14 91367 omp_get_thread_num:entry
1 78612 lj_tab_getstr:entry
1 78615 lj_tab_newkey:entry
14 91631 __emutls_get_address:entry
10 78125 lua_setfield:entry
1 78604 hashkey:entry
1 78128 lua_setmetatable:entry
14 94180 pthread_getspecific:entry
10 78058 index2adr:entry
1 78058 index2adr:entry
14 75815 _ZN20ScriptingEnvironment22getLuaStateForThreadIDEi:entry
1 79665 _Z11lua_rawsetpP9lua_StateiPv:entry
14 75968 _ZNSt6vectorIP9lua_StateSaIS1_EEixEm:entry
10 91810 strlen:entry
1 78105 lua_pushlightuserdata:entry
14 80317 _ZN7luabind4openEP9lua_State:entry
1 78066 lua_insert:entry
10 77992 lj_str_new:entry
1 78126 lua_rawset:entry
14 78108 lua_pushthread:entry
3 93734 set_parking_flag:entry
1 78058 index2adr:entry
10 78632 lj_meta_tset:entry
14 78064 lua_settop:entry
1 78614 lj_tab_set:entry
3 93981 queue_lock:entry
1 78604 hashkey:entry
10 78613 lj_tab_get:entry
14 80318 _ZN7luabind12_GLOBAL__N_113push_gc_udataINS_6detail14class_registryEP9lua_StateEEvS5_PvT0_:entry
3 93978 spin_lock_set:entry
1 77987 lj_obj_equal:entry
10 78612 lj_tab_getstr:entry
3 93998 no_preempt:entry
10 78615 lj_tab_newkey:entry
10 78604 hashkey:entry
Single thread:
15 80318 _ZN7luabind12_GLOBAL__N_113push_gc_udataINS_6detail14class_registryEP9lua_StateEEvS5_PvT0_:entry
4 78128 lua_setmetatable:entry
4 78058 index2adr:entry
2 93734 set_parking_flag:entry
15 80322 _ZN7luabind12_GLOBAL__N_115create_gc_udataINS_6detail14class_registryEEEPvP9lua_StateS4_:entry
15 78110 lua_newuserdata:entry
2 93982 queue_unlock:entry
4 79665 _Z11lua_rawsetpP9lua_StateiPv:entry
2 93979 spin_lock_clear:entry
15 78624 lj_udata_new:entry
4 78105 lua_pushlightuserdata:entry
2 93999 preempt:entry
4 78066 lua_insert:entry
2 94199 __lwp_park:entry
4 78126 lua_rawset:entry
4 78058 index2adr:entry
4 78614 lj_tab_set:entry
4 78604 hashkey:entry
15 78596 lj_mem_realloc:entry
4 77987 lj_obj_equal:entry
15 78270 lj_alloc_f:entry
15 78265 lj_alloc_malloc:entry
14 93734 set_parking_flag:entry
15 78106 lua_createtable:entry
15 78607 lj_tab_new:entry
14 93981 queue_lock:entry
15 78605 newtab:entry
14 93978 spin_lock_set:entry
15 78602 lj_mem_newgco:entry
15 78270 lj_alloc_f:entry
14 93998 no_preempt:entry
15 78265 lj_alloc_malloc:entry
15 78596 lj_mem_realloc:entry
14 93988 dequeue:entry
15 78270 lj_alloc_f:entry
14 93986 queue_slot:entry
15 78265 lj_alloc_malloc:entry
14 93987 queue_unlink:entry
15 78103 lua_pushcclosure:entry
14 93998 no_preempt:entry
14 93982 queue_unlock:entry
15 78006 lj_func_newC:entry
15 78602 lj_mem_newgco:entry
14 93979 spin_lock_clear:entry
15 78270 lj_alloc_f:entry
14 93999 preempt:entry
15 78265 lj_alloc_malloc:entry
14 94200 __lwp_unpark:entry
15 78125 lua_setfield:entry
15 78058 index2adr:entry
also, make sure that the profiles directory is in the same directory as your binaries.
When using mdb (similar to gdb) this is the output I get. I do have the profiles directory in my build folder where I'm running the program from:
[root@build ~/build/Project-OSRM/build]# mdb osrm-extract
::run philadelphia.osm.pbf [info] Input file: philadelphia.osm.pbf [info] Profile: profile.lua [info] Threads: 8 [info] Using script profile.lua mdb: stop on SIGSEGV mdb: target stopped at: lj_obj_equal: movl 0x4(%rdi),%edx ::run philadelphia.osm.pbf --threads=1 [info] Input file: philadelphia.osm.pbf [info] Profile: profile.lua [info] Threads: 1 [info] Using script profile.lua mdb: stop on SIGSEGV mdb: target stopped at: lj_obj_equal: movl 0x4(%rdi),%edx
The sys-proctable gem is not required for running tests. It's used by some Rake tasks, but they're not involved when running cucumber tests.
@emiltin, the tests didn't go so well. The core dumps seem to be unrelated to input.
249 scenarios (247 failed, 2 passed) 904 steps (247 failed, 1 skipped, 656 passed) 3m27.617s
Full output is at: http://thispos.com/test.output
You can run a single test, for example "cucumber -t @smallest". Additional info will be in test/fail.log. But looking at your output, it looks like it's also osrm-extract failing when running the tests.
Agreed. What's my next step for continuing to troubleshoot this? SmartOS is an ideal platform for running OSRM.
You build output includes warnings of this type:
/usr/local/include/luabind/detail/format_signature.hpp:87:5: warning: ISO C++ 1998 does not support 'long long' [-Wlong-long]
What version of gcc do you use?
Here is my output for gcc -v
Using built-in specs.
COLLECT_GCC=/opt/local/gcc47/bin/gcc
COLLECT_LTO_WRAPPER=/opt/local/gcc47/libexec/gcc/x86_64-sun-solaris2.11/4.7.3/lto-wrapper
Target: x86_64-sun-solaris2.11
Configured with: ../gcc-4.7.3/configure --enable-languages='c obj-c++ objc go fortran c++' --enable-shared --enable-long-long --with-local-prefix=/opt/local --enable-libssp --enable-threads=posix --with-boot-ldflags='-static-libstdc++ -static-libgcc -Wl,-R/opt/local/lib ' --disable-nls --enable-__cxa_atexit --with-gxx-include-dir=/opt/local/gcc47/include/c++/ --without-gnu-ld --with-ld=/usr/bin/ld --with-gnu-as --with-as=/opt/local/bin/gas --prefix=/opt/local/gcc47 --build=x86_64-sun-solaris2.11 --host=x86_64-sun-solaris2.11 --infodir=/opt/local/gcc47/info --mandir=/opt/local/gcc47/man
Thread model: posix
gcc version 4.7.3 (GCC)
Of particular interest, I see the flag "--enable-long-long"
Any updates?
@DennisOSRM I'm going to give this another go today, will post updates. I'll also be trying node-osrm as well. Whichever compiles first wins as they both suit my needs well. Keep up the good work!
Ok cool. Please report here if you have any results.
I didn't get to spend as much time on it as I had hoped.
[root@build ~/Project-OSRM]# mkdir -p build; cd build; cmake ..; make
-- The C compiler identification is GNU 4.7.3
-- The CXX compiler identification is GNU 4.7.3
-- Check for working C compiler: /opt/local/bin/cc
-- Check for working C compiler: /opt/local/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /opt/local/bin/c++
-- Check for working CXX compiler: /opt/local/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Building on a 64 bit system
-- Configuring OSRM in release mode
-- Performing Test HAS_LTO_FLAG
-- Performing Test HAS_LTO_FLAG - Success
-- Performing Test HAS_LTO_PARTITION_FLAG
-- Performing Test HAS_LTO_PARTITION_FLAG - Success
-- Boost version: 1.55.0
-- Found the following Boost libraries:
-- date_time
-- filesystem
-- iostreams
-- program_options
-- regex
-- system
-- thread
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
ERROR: Intel TBB NOT found!
-- Looked for Threading Building Blocks in /opt/intel/tbb;/usr/local/include;/usr/include
CMake Error at cmake/FindTBB.cmake:272 (message):
Could NOT find TBB library.
Call Stack (most recent call first):
CMakeLists.txt:191 (find_package)
I ran into an issue with TBB. I downloaded it and tried placing the contents of the include folder in any of the paths that cmake said it looked in. This didn't work. I tried running make on the library (You have to pass compiler=gcc
or it fails entirely.) Once I got past that, this happens when running make compiler=gcc
:
make[1]: Leaving directory '/root/Project-OSRM/tbb42_20140601oss/build/SunOS_intel64_gcc_cc4.7.3_kernel5.11_release'
make -C "../build/SunOS_intel64_gcc_cc4.7.3_kernel5.11_release" -r -f ../../build/Makefile.test cfg=release
make[1]: Entering directory '/root/Project-OSRM/tbb42_20140601oss/build/SunOS_intel64_gcc_cc4.7.3_kernel5.11_release'
g++ -o test_mutex.o -c -MMD -O2 -DUSE_PTHREAD -m64 -DTEST_USES_TBB=1 -Wall -Wshadow -Wcast-qual -Woverloaded-virtual -Wnon-virtual-dtor -Wextra -I../../src -I../../src/rml/include -I../../include -I. ../../src/test/test_mutex.cpp
In file included from ../../src/test/test_mutex.cpp:562:0:
../../src/test/harness_tsx.h: In function 'bool have_TSX()':
../../src/test/harness_tsx.h:58:9: error: expected unqualified-id before numeric constant
../../src/test/harness_tsx.h:70:29: error: lvalue required in asm statement
../../src/test/harness_tsx.h:70:29: error: invalid lvalue in asm output 0
../../build/common_rules.inc:85: recipe for target 'test_mutex.o' failed
make[1]: *** [test_mutex.o] Error 1
make[1]: Leaving directory '/root/Project-OSRM/tbb42_20140601oss/build/SunOS_intel64_gcc_cc4.7.3_kernel5.11_release'
Makefile:116: recipe for target 'tbb_test_release_no_depends' failed
make: *** [tbb_test_release_no_depends] Error 2
[root@build ~/Project-OSRM/tbb42_20140601oss/src]#
gcc -v output
Using built-in specs.
COLLECT_GCC=/opt/local/gcc47/bin/gcc
COLLECT_LTO_WRAPPER=/opt/local/gcc47/libexec/gcc/x86_64-sun-solaris2.11/4.7.3/lto-wrapper
Target: x86_64-sun-solaris2.11
Configured with: ../gcc-4.7.3/configure --enable-languages='c obj-c++ objc go fortran c++' --enable-shared --enable-long-long --with-local-prefix=/opt/local --enable-libssp --enable-threads=posix --with-boot-ldflags='-static-libstdc++ -static-libgcc -Wl,-R/opt/local/lib ' --disable-nls --enable-__cxa_atexit --with-gxx-include-dir=/opt/local/gcc47/include/c++/ --without-gnu-ld --with-ld=/usr/bin/ld --with-gnu-as --with-as=/opt/local/bin/gas --prefix=/opt/local/gcc47 --build=x86_64-sun-solaris2.11 --host=x86_64-sun-solaris2.11 --infodir=/opt/local/gcc47/info --mandir=/opt/local/gcc47/man
Thread model: posix
gcc version 4.7.3 (GCC)
make compiler=gcc tbb
Succeeds without issue.
I don't think this has anything to do with it, but I can't be sure: https://github.com/DennisOSRM/Project-OSRM/blob/master/cmake/FindTBB.cmake#L118
Hello,
I've been working on getting Project-OSRM to run on SmartOS. Last week I was able to successfully compile it, however, when I tried to run ./osrm-prepare on a small metro dataset (pbf and xml format) I got a core dump.
The input file was: http://osm-extracted-metros.s3.amazonaws.com/philadelphia.osm.pbf
Examining the core with mdb
Using dtrace to monitor function calls up until the time of the crash provided 4.6MB of function calls (http://thispos.com/function_calls), this is the tail end just before the crash:
Here is my cmake/make output: