Open nyh opened 6 years ago
I came across this stack trace when trying to run capstan with newest kernel and older cpiod.so (from Jan 2018). It only happened once which makes me think it is some kind of race condition described above.
OSv v0.24-536-gd44fe0f3
application::prepare_argv called!
eth0: 192.168.122.15
application::prepare_argv called!
/tools/cpiod.so: failed looking up symbol _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKSs (boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const)
[backtrace]
0x0000000000344963 <elf::object::symbol(unsigned int)+227>
0x00000000003977d6 <elf::object::arch_relocate_rela(unsigned int, unsigned int, void*, long)+166>
0x00000000003429e4 <elf::object::relocate_rela()+148>
0x00000000003456f7 <elf::object::relocate()+199>
0x0000000000348c7c <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1452>
0x00000000003494d0 <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+336>
0x000000000042383b <osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<0x00000000004240bc <osv::application::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>0x000000000042432b <osv::application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+91>
0x000000000021599c <do_main_thread(void*)+1964>
0x00000000004570d5 <???+4550869>
0x00000000003f1ba6 <thread_main_c+38>
0x00000000003935e2 <???+3749346>
0xd300ef2ce4f001ff <???+-454032897>
0x00000000003f159f <???+4134303>
0x4156415741e58947 <???+1105561927>
What is also interesting the symbol _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKSs does not exist: nm build/release/loader.elf | grep _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKSs
but this one does: 0000000000602190 T _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
If expected symbol does not exist (or I am misreading it and it does) then how come almost always (except one occurrence) cpiod works.
Two potential problems in program::get_library(). I noticed these in a code review, I haven't seen any actual crash or error:
init_library() is traditionally called inside get_library(), where it is protected with a
SCOPE_LOCK(_mutex)
. Since it can now be called from outside get_library(), we should take the lock also in init_library() itself (because our mutex is recursive, it's ok to take the lock in both).get_library() stores the list of loaded objects needing initialization at the top of program's _loaded_objects_stack (the entire list is one item on this stack), and init_library() runs the last item (potentially for several objects loaded together). This is probably wrong - what if we call two get_library() for the same program almost in parallel, and then call two init_library()? We can have the init_library() initialize the wrong objects. For example consider we do this:
One wrong solution is not to have a stack of object lists, instead of in the elf::object (what get_library() returns) its own list. So the caller (app.cc) doesn't run program::init_library() - rather it runs _lib->init_library() (where _lib is the previous return from get_library). But unfortunately, this is not really a correct solution. Both lib1.so and lib2.so may be linked with (DT_NEEDED) the same third object, so it's important that although lib2.so did not need to load this object a second time, when we wait for its init_library(), we must wait for this shared object to have been initialized.
Instead we can expand the list we keep for each object to which objects it needed (not necessarily loaded!) - actually I think we already have this list (_needed)! We also need to remember which object has already been initialized, so we don't initialize again an object object which we already initialized.
We need to make sure that after changing this, the init order remains sane. Note that the existing init order wasn't perfect either (see issue #334), but we don't want it to become even worse.