cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.12k stars 605 forks source link

Combining pre-compiled OSv kernel with pre-compiled executable #821

Open nyh opened 7 years ago

nyh commented 7 years ago

OSv's bundled build system ("scripts/build") encourages building both the OSv kernel and the user's application using the same compiler and libraries (see also issue #743, #687, #619)

However, often you'd like to combine a compiled OSv kernel with a pre-compiled executable compiled by someone else. This is even more common in better OSv image composition tools like Capstan and Mikelangelo's improved version (https://github.com/mikelangelo-project/capstan). In that case, it is possible that the the OSv kernel and the executable were compiled with different versions of the compiler, and/or different versions of libraries.

Currently, mixing osv/application compiler and library versions encounters these kind of problems:

  1. The application is compiled with one version of libstdc++, but OSv supplies the symbols of a different version.
  2. Same thing for the couple of Boost libraries which the OSv kernel uses (and are also included in the OSv kernel).
  3. The gcc compiler also assumes the gcc support library, which is also compiled into the kernel and used, for example, for C++ exception handling, so we may have problems if the gcc's major version doesn't match.

The goal of this issue is to do two things:

  1. Reproduce the above problems, and come up with techniques to overcome them. For example, perhaps we can hide certain symbols in the OSv kernel (see issue #97) that come from the libraries we don't want to export. Or perhaps we don't even need hiding - the application can include in the image a separate copy of that library - the version it wants to use - and our dynamic linker will "do the right thing" (in the application, prefer symbols from the application's included library, not the kernel).
  2. Document what can or can't be done in this area - especially if the attempts in the previous paragraphs cannot solve all problems.
miha-plesko commented 7 years ago

One thing to add here is that we should not interfere with one of the most awsome features of the OSv:

You can compile your existing Linux application with its normal
build process, and run the resulting Linux executable on OSv.

I'm afraid that if we e.g. force user to compile application in a way that it also contains the symbols, this feature would be broken.

So, what you suggest is that we prepare a separate MPM package with the symbols, one package for each set of them e.g.

com.package.symbols.ubuntu-14.04
com.package.symbols.ubuntu-16.04
...

In the unikernel we then upload appropriate package depending on what the user application is compiled for. This way we overcome the incompatibility between user application and OSv kernel, which is great.

However, the incompatibility between user application and remote packages remains unresolved. I'm not sure, actually, if this compatibility is really that important. I guess that users will just either grab packages that we've prepared and use them either compile their application and use no pre-prepared package.

nyh commented 7 years ago

https://groups.google.com/d/msg/osv-dev/zAkAilS446Q/mWsMmzXRCgAJ is yet another example of this problem: A person used a two year old version of OSv (from Capstan) and trying to run modern-compiled software (which assumed a newer C++ ABI than was included in OSv), and encountered missing symbols in std::__cxx11::basic_string.

As mentioned in that thread by @avikivity one workaround is to recompile the application with -D_GLIBCXX_USE_CXX11_ABI=0.

A completely different workaround might be to hide the C++ library inside OSv as an internal implementation (and only include the part of the library which OSv actually uses) - and ask the application to provide its own copy of the C++ library which suites it. This would of course mean that there cannot be any OSv-specific C++ APIs used by the application, but this is normally fine (all of the Linux APIs are C-only and don't need C++).

wkozaczuk commented 6 years ago

I just came across two incompatibility issues I wanted to describe (they may be falling into what we described above but may be new ones).

  1. A mikelangelo capstan user tried to use newest OSv kernel (loader.img) with MPM packages created in January 2018. He came across this error when trying to run OSv:
OSv v0.24-534-g54e6c42
could not load libvdso.so

[backtrace]
0x0000000000413268 osv::application::prepare_argv(elf::program*)+1128
0x0000000000414bbd <osv::application::application(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<0x00000000004169b8 <osv::application::run_and_join(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > const*, waiter*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std:0x000000000040bd7b <osv::run(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, int*, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > const*)+107>
0x0000000000425033 <???+4345907>
0x00000000002148ad <do_main_thread(void*)+5821>
0x0000000000447725 <???+4486949>
0x00000000003e5d76 <thread_main_c+38>
0x0000000000389bf2 <???+3709938>
0x0150e3bad77aa6ff <???+-679827713>
0x00000000003e579f <???+4085663>
0xfb89485354415540 <???+1413567808>

This happened because newest OSv kernel was trying to load new tiny libvdso library which was NOT part of older osv-bootstrap.mpm package that normally contains the stuff from usr.manifest.skel.

  1. I myself was trying to test new OSv kernel with existing mikelangelo packages (in anticipation of releasing OSv soon and what it would mean to the users) and after solving the issue above I came across this one:
    
    java.so: Starting JVM app using: io/osv/nonisolated/RunNonIsolatedJvmApp
    /java.so: failed looking up symbol _ZN3elf7program11get_libraryESsSt6vectorISsSaISsEE (elf::program::get_library(std::string, std::vector<std::string, std::allocator<std::string> >))

[backtrace] 0x000000000033e603 <elf::object::symbol(unsigned int)+227> 0x0000000000341229 <elf::object::resolve_pltgot(unsigned int)+137> 0x0000000000341415 <elf_resolve_pltgot+69> 0x00000000003888af <???+3705007> 0x0000000000c6b15f <???+13021535> 0x00000000003e5d76 <thread_main_c+38> 0x0000000000389bf2 <???+3709938> 0xe9201417c18320ff <???+-1048370945> 0x00000000003e579f <???+4085663> 0xfb89485354415540 <???+1413567808>

This one happened because osv.openjdk8-zulu-compact1.mpm from mikelangelo had older java.so that attempted to use OSv API function elf::program::get_library() that has changed since to support Golang.

I think these two issues suggest that every time we release OSv and publish kernel binary on github we also build and publish capstan packages that contain OSv apps/modules (java.so, httpserver-api.so, etc). This would require structuring capstan packages differently - for example osv.openjdk8-zulu-compact1.mpm should not contain zulu JDK and run-java artifacts. Instead these should be separate and for example java.so should be part of separate MPM package.

So in general I postulate that OSv-specific public API (not glibc, SYSCALL) should not be required to be backwards compatible. However ideally OSv should be backwards compatible (and I think it is) as far libc or SYSCALL are concerned.

Also I wonder if what @miha-plesko stated about 'build platform' is exactly accurate. Is it really Ubuntu 14 vs Ubuntu 16 or even Fedora 27? Or to be more precise it is about version of GCC compiler and libraries used to build OSv kernel, its internal apps (look java.so as an example) and user apps and libraries that user app uses? Does it really matter which version of GCC and what distribution of Linux was use to build node.JS, python or ruby from apps folder? 

If that is a build tool chain that matters than what makes up the "tool chain"?

I was reading more about shared libraries and I found the paragraph **Incompatible Libraries** from http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html specifically applicable. In general well maintained **libraries (for example boost ones) should be backwards compatible**. So if we always build against the newest ones we should be good if any app was using it (case of 2 boost libraries that OSv kernel is linked with and exposes it).

Some other related questions that came to my mind:
- Does GCC compiler version really matter or is it C++ standard library version that varies across GCC versions matter? If so what is an example of it?
- Should libboost and libstdc++ libraries be hidden except from internal OSv apps (like java.so, httpserver, cloud-init, etc)?
- How these compatibility issues are handled on Linux (what if app uses old version of library that is not on of this system?) 
- What is meaning of number (version?); is it used/enforced by OSv? For example elf.cc lists these with version number:
    "libresolv.so.2",
      "libc.so.6",
      "libm.so.6",

ifdef __x86_64__

      "ld-linux-x86-64.so.2",
      "libboost_system.so.1.55.0",
      "libboost_program_options.so.1.55.0",

endif / __x86_64__ /

ifdef aarch64

      "ld-linux-aarch64.so.1",
      "libboost_system-mt.so.1.55.0",
      "libboost_program_options-mt.so.1.55.0",

endif / aarch64 /

      "libpthread.so.0",
      "libdl.so.2",
      "librt.so.1",
      "libstdc++.so.6",
      "libaio.so.1",
      "libxenstore.so.3.0",
      "libcrypt.so.1",
nyh commented 6 years ago

On Sun, May 6, 2018 at 8:47 PM, WALDEMAR KOZACZUK notifications@github.com wrote:

I just came across two incompatibility issues I wanted to describe (they may be falling into what we described above but may be new ones).

I'm not sure the bug tracker is the best medium to have such discussions, maybe in the future we should split this issue into several issues.

  1. A mikelangelo capstan user tried to use newest OSv kernel (loader.img) with MPM packages created in January 2018. He came across this error when trying to run OSv:

OSv v0.24-534-g54e6c42 could not load libvdso.so This happened because newest OSv kernel was trying to load new tiny libvdso library which was NOT part of older osv-bootstrap.mpm package that normally contains the stuff from usr.manifest.skel.

Interesting. It is indeed a bad idea for the kernel to depend on external files. I think in his case we should not have crashed - I opened https://github.com/cloudius-systems/osv/issues/966 about that.

  1. I myself was trying to test new OSv kernel with existing mikelangelo packages (in anticipation of releasing OSv soon and what it would mean to the users) and after solving the issue above I came across this one:

java.so: Starting JVM app using: io/osv/nonisolated/RunNonIsolatedJvmApp /java.so: failed looking up symbol _ZN3elf7program11get_libraryESsSt6vectorISsSaISsEE (elf::program::get_library(std::string, std::vector<std::string, std::allocator >))

I have seen this error myself, and mis-attributed it to my updated Linux distribution, instead of the updated OSv - see https://github.com/cloudius-systems/osv/issues/963

This one happened because osv.openjdk8-zulu-compact1.mpm from mikelangelo had older java.so that attempted to use OSv API function elf::program::get_library() that has changed since to support Golang.

Indeed :-( We probably should not have done that. We could have added a new get_library() overload without changing the existing one, for example. We can still do that if we want - it's not like thousands of people are already using the current version ;-)

I think these two issues suggest that every time we release OSv and publish kernel binary on github we also build and publish capstan packages that contain OSv apps/modules (java.so, httpserver-api.so, etc).

This is definitely the safest approach, but I don't know how practical it is. There is no real need to recompile all the applications on trivial changes to OSv which don't change the ABI and don't change any of the compilers or libraries.

Or, do you mean only the apps which use OSv-specific (non-Linux) ABIs, and not all of the apps?

So in general I postulate that OSv-specific public API (not glibc, SYSCALL) should not be required to be backwards compatible. However ideally OSv should be backwards compatible (and I think it is) as far libc or SYSCALL are concerned.

I think it probably should be backwards compatible also for OSv-specific public API, it's just a bit less critical.

Also I wonder if what @miha-plesko https://github.com/miha-plesko stated about 'build platform' is exactly accurate. Is it really Ubuntu 14 vs Ubuntu 16 or even Fedora 27? Or to be more precise it is about version of GCC compiler and libraries used to build OSv kernel, its internal apps (look java.so as an example) and user apps and libraries that user app uses? Does it really matter which version of GCC and what distribution of Linux was use to build node.JS, python or ruby from apps folder?

Well, it's definitely possible (and I believe we've seen it in the past) that some C++ application compiled on Gcc X could not run on OSv compiled on Gcc Y, because OSv includes version Y of the libstdc++, but the application was compiled with header files, C++ ABIs, etc., of Gcc X, which are different. This is why one of the suggestions I made above was to have OSv hide the fact it contains libstdcc++ (for example), and allow the application to bring in its own version of libstdcc++, possibly different than the one contained in OSv. It would be a waste of space (and memory), but will solve this potential problem.

If that is a build tool chain that matters than what makes up the "tool chain"?

I was reading more about shared libraries and I found the paragraph Incompatible Libraries from http://tldp.org/HOWTO/Program-Library-HOWTO/shared- libraries.html specifically applicable. In general well maintained libraries (for example boost ones) should be backwards compatible. So if we always build against the newest ones we should be good if any app was using it (case of 2 boost libraries that OSv kernel is linked with

and exposes it).

What this could mean for us is that old compiled applications would be able to run on newer compiled OSv (so it includes newer versions of the libraries, which are backward compatible).

I'm not sure this actually works across many major version of Gcc and related libraries - we started OSv with gcc 4.8, and now we're already at 8.1. I don't remember off-hand all the troubles we encountered.

Some other related questions that came to my mind:

  • Does GCC compiler version really matter or is it C++ standard library version that varies across GCC versions matter? If so what is an example of it?

We have for example https://github.com/cloudius-systems/osv/issues/501 I don't remember off-hand what other problems we saw. Some of them are documented earlier in the comments in this issue.

  • Should libboost and libstdc++ libraries be hidden except from internal OSv apps (like java.so, httpserver, cloud-init, etc)?

As I said above, maybe - but the cost is wasted disk and memory space.

  • How these compatibility issues are handled on Linux (what if app uses old version of library that is not on of this system?)

In most modern Linux distros, you upgrade everything together - you get a libstdc++ from Fedora 28, an application from Fedora 28, and that was specially compiled with that version of libstdc++.

  • What is meaning of number (version?); is it used/enforced by OSv? For example elf.cc lists these with version number:

    "libresolv.so.2",
      "libc.so.6",

It's not used or enforced by OSv. Rather, it is recorded by the linker in the .so - for example on my system

$ ls -l /usr/lib64/libboost_system.so lrwxrwxrwx. 1 root root 25 Feb 28 19:45 /usr/lib64/libboost_system.so -> libboost_system.so.1.64.0

So if I link something with "libboost_system.so", what's get recorded (as DT_NEEDED) isn't libboost_system.so, but rather "libboost_system.so.1.64.0". This is what OSv tries to load, and if it's already on elf.cc's list, the load knows it doesn't need to do anything. Otherwise, it does try to load the file (and if it's missing, the failure is ignored - with a debug message).

yieldone commented 6 years ago

Hi folks,

If I may chime in, I noticed @wkozaczuk mentioned something about usr.manifest.skel. In capstan-packages, the docker recipe for jdk-zul-full (https://github.com/mikelangelo-project/capstan-packages/blob/master/docker_files/recipes/openjdk8-zulu-full/build.sh) contains the following:

${OSV_DIR}/scripts/build image=openjdk8-zulu-full export=all usrskel=none export_dir=$PACKAGE_RESULT_DIR -j ${CPU_COUNT}

I used this as my basis for preparing my base image. Does "usrskel=none" therefore ignore usr.manifest.skel?

My process is now as follows: 1) gen osv-bootloader image, upload to S3 2) gen jdk image, upload to S3 3) gen osv.bootstrap, upload to S3 4) refresh all the above images locally (pull)

If I include the usr.manifest.skel when I prepare the JDK image, perhaps I can avoid building osv.bootstrap explicitly?

In any case, the above process resolves all my issues thus far, really appreciated all the tips!

Cheers,

Rowland

miha-plesko commented 6 years ago

@wkozaczuk it's most probably not Ubuntu 14.04 vs 16.04 issue, but some underlying library version change between them as you suggest. It's just that I never dived in but still wanted to make it as understandable to users as possible :) Also, I'm more of a high-level programmer (ruby, python) and would have hard time actually figuring out exact reason; gonna have to let you guys do this part šŸ™Œ šŸ˜‡ .

@yieldone you can find some documentation on "userskel=none" argument on OSv commit message here. Also, you can examine this commit on captan-packages here to see how we modified recipes when the option was introduced.

BTW: I'm not surprised it works for you since you've recompiled everything you need. I'm afraid you'll have to stick with osv.bootstrap because Capstan always requires it (it's hard-coded). That being said you could technically prepare empty osv.bootsrtap package (just remove all the files prior zipping) and let your jdk package contain all the files you need. Although having it in a separate package sounds more reasonable to me. Here is a recipe to build it; you'll notice that we actually just build nothing (besides OSv kernel itself) but export it all.

wkozaczuk commented 6 years ago

For everybody interested I will be moving the discussion to the mailing list as it will be easier to ask/answer questions.