facebookarchive / bistro

Bistro is a flexible distributed scheduler, a high-performance framework supporting multiple paradigms while retaining ease of configuration, management, and monitoring.
https://bistro.io
MIT License
1.04k stars 158 forks source link

How to optimize Docker image size? #45

Open JanMikes opened 4 years ago

JanMikes commented 4 years ago

Hi, i was finally able to build a Docker image with Bistro, but i am a bit worried about it's enormous size. It has roughly 5.2gb.

Do you have any tips how to reduce it's size?

It is automatically generated Dockerfile using fbcode_builder.

Basically it is repeating blocks of download+build+install blocks:

### Check out fmtlib/fmt, workdir build ###

USER root
RUN mkdir -p '/home' && chown 'nobody' '/home'
USER 'nobody'
WORKDIR '/home'
RUN git clone  https://github.com/'fmtlib/fmt'
USER root
RUN mkdir -p '/home'/'fmt'/'build' && chown 'nobody' '/home'/'fmt'/'build'
USER 'nobody'
WORKDIR '/home'/'fmt'/'build'
RUN git checkout '6.2.1'

### Build and install fmtlib/fmt ###

RUN CXXFLAGS="$CXXFLAGS -fPIC -isystem "'/home/install'"/include" CFLAGS="$CFLAGS -fPIC -isystem "'/home/install'"/include" cmake -D'CMAKE_INSTALL_PREFIX'='/home/install' -D'BUILD_SHARED_LIBS'='ON' '..'
RUN make -j '4' VERBOSE=1 
RUN make install VERBOSE=1 

I was thinking if i can somehow remove cache. Maybe just rm -rf /fmt (same for every other cloned repository) after package is installed could help to reduce size.

As well i do not usually use c++ so i do not know how it really works internally, please if i am mistaken and my idea is stupid, just correct me 😄 if we could take only the final binaries and extract them to different, clean, docker image?

Other idea was using some alpine based linux or other base image than ubuntu (quick googling brought me to https://github.com/madduci/docker-cpp-env).

Can anything of this work or would you suggest anything completely different?

I was thinking about having autoscaling mechanism for bistro workers etc on aws spot instances (maybe even as lambdas) and for these purposes i wanted to have image as thin as possible.

snarkmaster commented 4 years ago

I've never tried solving for docker image size, but, I can give you some thoughts:

I plan to switch the OSS CI build to static linking at some point, which would help your use-case. Unfortunately, it's hard to for me to predict how long this will take, because I'm bad at CMake and there are some hurdles.

Internally, statically linked binaries go like this:

The upshot here is that if you're willing to lose useful backtraces, you can probably get these to be pretty small.

JanMikes commented 4 years ago

You could apt-get remove packages you no longer need after the build.

Good start, not sure which are those though 😄

Yes, you can definitely remove the build trees, and the ccache build cache directory.

I am limited with my c++ knowledge. I have not found ccache directory by running find /home -type d -name "*cache*" -print. By build trees you mean the cloned repositories?

I just made very quick test with dockerfile like this (it is common multi-stage strategy to optimize buildes, so i have builder image + the real carrying only executables):

FROM ubuntu:18.04

COPY --from=docker.pkg.github.com/rectorphp/docker-base-bistro-image-builder/bistro:latest /home/bistro/bistro/cmake/Debug/server/bistro_scheduler /bistro/bistro_scheduler
COPY --from=docker.pkg.github.com/rectorphp/docker-base-bistro-image-builder/bistro:latest /home/bistro/bistro/cmake/Debug/worker/bistro_worker /bistro/bistro_worker

But as expected it fails on some missing dependencies, mine was:

./bistro_scheduler: error while loading shared libraries: libfolly.so: cannot open shared object file: No such file or directory

There was just an idea, if i could copy everything needed & compiled things into new image without any unnecessary stuff.

nobody@9e7a345ca022:/home$ ls
bistro  fbthrift  fizz  fmt  folly  googletest  install  libsodium  mvfst  proxygen  wangle  zstd

I even tried to remove everything except bistro directory, but it has same error message: libfolly.so: cannot open shared object file: No such file or directory and that brings me to idea of deleting everything except of these:

nobody@f2a57d2acdd6:/home$ find /home -type f -name "*.so" -print
/home/install/lib/libconcurrency.so
/home/install/lib/libcompiler_lib.so
/home/install/lib/libcompiler_ast.so
/home/install/lib/libtransport.so
/home/install/lib/libthriftcpp2.so
/home/install/lib/libcompiler_generators.so
/home/install/lib/libthrift-core.so
/home/install/lib/libmustache_lib.so
/home/install/lib/libthriftprotocol.so
/home/install/lib/libprotocol.so
/home/install/lib/libthriftfrozen2.so
/home/install/lib/libcompiler_generate_templates.so
/home/install/lib/libasync.so
/home/install/lib/librpcmetadata.so
/home/install/lib/libcompiler_base.so
/home/install/lib/libthriftmetadata.so
/home/install/lib/libmvfst_state_qpr_functions.so
/home/install/lib/libmvfst_state_ack_handler.so
/home/install/lib/libmvfst_state_pacing_functions.so
/home/install/lib/libmvfst_state_simple_frame_functions.so
/home/install/lib/libmvfst_exception.so
/home/install/lib/libmvfst_state_stream_functions.so
/home/install/lib/libmvfst_state_functions.so
/home/install/lib/libmvfst_constants.so
/home/install/lib/libmvfst_state_machine.so
/home/install/lib/libfizz_test_support.so
/home/install/lib/libfolly.so
/home/install/lib/libfollybenchmark.so
/home/install/lib/libfolly_test_util.so
/home/fbthrift/thrift/lib/libconcurrency.so
/home/fbthrift/thrift/lib/libcompiler_lib.so
/home/fbthrift/thrift/lib/libcompiler_ast.so
/home/fbthrift/thrift/lib/libtransport.so
/home/fbthrift/thrift/lib/libthriftcpp2.so
/home/fbthrift/thrift/lib/libcompiler_generators.so
/home/fbthrift/thrift/lib/libthrift-core.so
/home/fbthrift/thrift/lib/libmustache_lib.so
/home/fbthrift/thrift/lib/libthriftprotocol.so
/home/fbthrift/thrift/lib/libprotocol.so
/home/fbthrift/thrift/lib/libthriftfrozen2.so
/home/fbthrift/thrift/lib/libcompiler_generate_templates.so
/home/fbthrift/thrift/lib/libasync.so
/home/fbthrift/thrift/lib/librpcmetadata.so
/home/fbthrift/thrift/lib/libcompiler_base.so
/home/fbthrift/thrift/lib/libthriftmetadata.so
/home/mvfst/build/quic/state/libmvfst_state_qpr_functions.so
/home/mvfst/build/quic/state/libmvfst_state_ack_handler.so
/home/mvfst/build/quic/state/libmvfst_state_pacing_functions.so
/home/mvfst/build/quic/state/libmvfst_state_simple_frame_functions.so
/home/mvfst/build/quic/state/libmvfst_state_stream_functions.so
/home/mvfst/build/quic/state/libmvfst_state_functions.so
/home/mvfst/build/quic/state/libmvfst_state_machine.so
/home/mvfst/build/quic/libmvfst_exception.so
/home/mvfst/build/quic/libmvfst_constants.so
/home/fizz/fizz/build/lib/libfizz_test_support.so
/home/folly/_build/libfolly.so
/home/folly/_build/folly/logging/example/liblogging_example_lib.so
/home/folly/_build/folly/libfollybenchmark.so
/home/folly/_build/libfolly_test_util.so

I am very experienced with docker and capable of optimizing the build, but unfortunately my c++ knowledge is slowing me down 😄

JanMikes commented 4 years ago

After removing files via

find ./bthrift/ -type f ! -name '*.so*' -delete
find ./fizz/ -type f ! -name '*.so*' -delete
find ./fmt/ -type f ! -name '*.so*' -delete
find ./folly/ -type f ! -name '*.so*' -delete
find ./googletest/ -type f ! -name '*.so*' -delete
find ./install/ -type f ! -name '*.so*' -delete
find ./libsodium/ -type f ! -name '*.so*' -delete
find ./mvfst/ -type f ! -name '*.so*' -delete
find ./proxygen/ -type f ! -name '*.so*' -delete
find ./wangle/ -type f ! -name '*.so*' -delete
find ./zstd/ -type f ! -name '*.so*' -delete

It still works. Now i need to know what can be deleted from distro (probably source codes as well, logs etc., i rather what should be kept instead of what should be deleted is easier approach).

Next step will be other system thins and dependenciesm not sure which are those yet.

What difference will make running ./cmake/run-cmake.sh Release instead of Debug parameter?

JanMikes commented 4 years ago

FYI, so far after removing for non-bistro everything except .so results:

Before:

nobody@66b52e09ab45:/home$ du -hs .
4.1G    .

After:

nobody@66b52e09ab45:/home$ du -hs .
2.6G    .

Not bad for just a start.

snarkmaster commented 4 years ago

no /ccache

If there's no /ccache, it sounds like you're not running with that enabled. You would see the output of this in the logs of the program that prepares your Dockerfile:

            logging.info('Docker ccache not enabled')

This is fine, /ccache is most helpful for incremental development (change and rebuild).

what can be deleted

All build artifacts get installed in /home/install by default:

https://github.com/facebook/bistro/blob/4add83f0004325f4d7092dbe3c25eb2acc559733/build/fbcode_builder/make_docker_context.py#L62

So you should not need any of the build trees at all, just /home/install and a barebones OS.

For the OS, you could do things like apt-get remove gcc. The full set of deps we install on top of the base Ubuntu image is here:

https://github.com/facebook/bistro/blob/4add83f0004325f4d7092dbe3c25eb2acc559733/build/fbcode_builder/fbcode_builder.py#L182

An alternative approach is to find the smallest base OS of the same Ubuntu release that you can get, copy over /home/install, and then to install the missing dependencies (you'll get missing .so errors for each one).

Either way, it'd be a bit of a trial and error, I've never the time to separate the runtime dependencies from the build-time dependencies for the OSS build.

If you find time to upstream your work, that would be lovely. If not, maybe at least share a gist of your process on this issue?