cloud-bulldozer / benchmark-wrapper

Python Library to run benchmarks
https://benchmark-wrapper.readthedocs.io
Apache License 2.0
19 stars 56 forks source link

snafu base image #326

Open bengland2 opened 3 years ago

bengland2 commented 3 years ago

I was having problems rebuilding smallfile image, because pip kept failing to download a random python package, I got really annoyed and started thinking of ways to prevent this from happening, and came up with the idea of having a snafu base image that contained just the stuff that run_snafu.py needs, which is quite a lot actually. Then you make the individual benchmarks build off this base, so they don't have nearly as heavy a lift. For example:

[bengland@localhost benchmark-wrapper]$ cat snafu/smallfile_wrapper/generic-snafu-Dockerfile
FROM registry.access.redhat.com/ubi8:latest

RUN dnf install -y --nodocs git python3-pip
RUN dnf install -y --nodocs procps-ng iproute net-tools ethtool nmap iputils
RUN ln -s /usr/bin/python3 /usr/bin/python
COPY . /opt/snafu/
RUN pip3 install -e /opt/snafu/

and then the smallfile image becomes:

[bengland@localhost benchmark-wrapper]$ cat snafu/smallfile_wrapper/Dockerfile
FROM quay.io/bengland/snafu:latest

RUN git clone https://github.com/distributed-system-analysis/smallfile /opt/smallfile
RUN ln -sv /opt/smallfile/smallfile_cli.py /usr/local/bin/
RUN ln -sv /opt/smallfile/smallfile_rsptimes_stats.py /usr/local/bin/

and it rebuilds really fast. Would anyone else like to see this implemented? I could post a PR for this with smallfile and then we can incrementally extend it to other benchmarks if there is interest. Possible benchmarks that could use this would include:

[benchmark-operator]$ grep -r run_snafu . | awk -F: '{ print $1 }' | awk -F/ '/roles/{print $3}' | sort -u
cyclictest
fio_distributed
flent
fs-drift
hammerdb
image_pull
log_generator
oslat
pgbench
scale_openshift
smallfile
stressng
testpmd
uperf
vegeta
ycsb
learnitall commented 3 years ago

Awesome, thank you for putting this together. I've been wanting to work on making a base image for a long time, here was the one that I came up with, would love your thoughts on it: https://gist.github.com/learnitall/9a84c4e035765d5d450c6d01644af654.

Do you have the logs for the failed smallfile package thought? It would be great to see those, as I'm curious what the error was. May be related to https://github.com/cloud-bulldozer/benchmark-wrapper/pull/323. Thanks!

bengland2 commented 3 years ago

@learnitall here the pastebin that illustrates problem I was having: http://pastebin.test.redhat.com/986862 actually this suggestion does not really resolve the problem I was having, it just means I don't have to deal with it all the time, only have to deal with it when the run_snafu base image changes. Any suggestions on why the original error is occurring? I thought files.pythonhosted.org would be more robust.

bengland2 commented 3 years ago

@learnitall Happy to use your snafu base image as long as I can make it work with the benchmarks that I use, will try it out. I'm guessing you looked at more benchmarks than I did.

questions:

Why isn't python3-pip RPM installed? Where do you get pip from?

why python 3.6 specifically?

why this? pip install wheel setuptools.

why this at end? rm -fr /opt/snafu

thx -ben

bengland2 commented 3 years ago

tried running Ryan's base image above and it didn't succeed, why not?

http://pastebin.test.redhat.com/987114

MIRROR] perl-podlators-4.11-1.el8.noarch.rpm: Curl error (56): Failure when receiving data from the peer for https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/baseos/os/Packages/p/perl-podlators-4.11-1.el8.noarch.rpm [OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104] [MIRROR] groff-base-1.22.3-18.el8.x86_64.rpm: Curl error (56): Failure when receiving data from the peer for https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/baseos/os/Packages/g/groff-base-1.22.3-18.el8.x86_64.rpm [OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104] [FAILED] groff-base-1.22.3-18.el8.x86_64.rpm: No more mirrors to try - All mirrors were already tried without success

Is this package download so fragile that an occasional network error can bring it to its knees? This isn't pip failing, it's cdn-ubi.redhat.com! Seems like there should be some sort of option that says wait a few sec and retry

learnitall commented 3 years ago

Did some testing @bengland2 and I still am not sure as to why there were some network errors occurring, but I revisited the base image I posted earlier today and made some modifications based on your questions and work. It does depend on a PR currently under review (#323), therefore I created a new branch in my fork from that PR where I've posted the base image I came up with and modified the smallfile wrapper image to use said base image (it's located here: https://github.com/learnitall/benchmark-wrapper/tree/feature-add-base-image). When you get a chance, can you try this out for me?

git clone https://github.com/learnitall/benchmark-wrapper snafu-base-image
cd snafu-base-image
git checkout feature-add-base-image
podman build . -t snafu:latest
podman build . -t smallfile:latest -f snafu/smallfile_wrapper/Dockerfile

Here are the answers to your questions above, feel free to let me know if you have any follow-up Qs:

Why isn't python3-pip RPM installed? Where do you get pip from?

I went with this idea to use pyenv to build Python from source, rather than using an RPM, which would give us finer grain control over the version of python that we use in our images. I scratched this idea though and just went with the RPMs, as the build time was crazy long.

why python 3.6 specifically?

No idea honestly, just went for it because it's the minimum version of Python that snafu can use. I upgraded to 3.8 in the base image mentioned above.

why this? pip install wheel setuptools. why this at end? rm -fr /opt/snafu

When we use pip install -e . we are asking pip to install snafu in editable mode and keep the source code from the git repository that was copied into the image. When we install wheel, we get access to the bdist_wheel command within the setup.py file, allowing us to build a minimal distribution of snafu which is then installed. This allows us to scrap the data we copied into the image, which includes a lot of unnecessary stuff like git history, docs, other Dockerfiles, etc.. This keeps trim down the image size, improving pull time.

Thanks!

bengland2 commented 3 years ago

@learnitall I'm trying to build your base image for snafu and it keeps blowing up with curl errors, I googled and curl in the container image is really out of date, it gives me errors like:

[MIRROR] gcc-8.4.1-1.el8.x86_64.rpm: Curl error (56): Failure when receiving data from the peer for https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/appstream/os/Packages/g/gcc-8.4.1-1.el8.x86_64.rpm [OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104]

and then the container build aborts, but I think this is a lack of robustness in the curl version being used:

[bengland@localhost benchmark-wrapper]$ podman run -it c06e3fa0fce7
[root@897c7ef92b00 /]# curl --version
curl 7.61.1 (x86_64-redhat-linux-gnu) libcurl/7.61.1 OpenSSL/1.1.1g zlib/1.2.11 brotli/1.0.6 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.2.0) libssh/0.9.4/openssl/zlib nghttp2/1.33.0
Release-Date: 2018-09-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL Metalink 

but current version on Fedora 34 is:

[bengland@localhost benchmark-wrapper]$ curl --version
curl 7.76.1 (x86_64-redhat-linux-gnu) libcurl/7.76.1 OpenSSL/1.1.1k-fips zlib/1.2.11 brotli/1.0.9 libidn2/2.3.2 libpsl/0.21.1 (+libidn2/2.3.0) libssh/0.9.5/openssl/zlib nghttp2/1.43.0
Release-Date: 2021-04-14
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

and this version appears to have some fixes in it that would make it more robust in the face of download problems. any suggestions? So I tried fedora:34 instead of ubi8 as base image to isolate the problem. When I take base image Dockerfile above and just change the image from ubi8 to fedora:34, it no longer had download problems . It does complain about peer resetting the connection during download, but it just keeps going and succeeds first time. With ubi8, it failed every time on a different package. ideas?

bengland2 commented 3 years ago

never mind, I moved from -2G wireless network to -5G wireless network in my house and now it builds, will let you know if it works for smallfile.

bengland2 commented 3 years ago

@learnitall I based smallfile on your image and it worked. can you include git in the base image? This would save some time for things like smallfile that aren't part of an RPM. Other than that, I'm good with it. Would be good to test fio with it also. So my dockerfile prototype currently looks like:

FROM quay.io/bengland/snafu:ubi8

RUN dnf install -y git
COPY . /opt/snafu
RUN git clone https://github.com/distributed-system-analysis/smallfile /opt/smallfile
RUN ln -sv /opt/smallfile/smallfile_cli.py /usr/local/bin/
RUN ln -sv /opt/smallfile/smallfile_rsptimes_stats.py /usr/local/bin/
learnitall commented 3 years ago

Ah ok awesome. Glad to hear that switching your wireless helped out with the downloads, I was getting worried there. I edited the base image to include git and perform the soft linking of python3 to python. I modified the smallfile Dockerfile within the branch I shared with you yesterday, can you check it out? It matches your prototype almost exactly, just want to double check it works for you. I'll work on the fio Dockerfile later today.

bengland2 commented 3 years ago

will do, been busy, it's on my list.

bengland2 commented 3 years ago

@learnitall back to it now, I tried your image out and it's great, let me know when it is part of benchmark-operator so I can start converting storage benchmarks to use it. Or should I add it?

it turns out smallfile is even easier since none of the remaining RPMs were even necessary to run it, they were more for debugging and I can leave those out of the image. So all that's left is git clone basically. image build ran in 10 seconds and image push to quay ran ino 5 seconds!!!

For fio, I ran into a problem doing this, I had to install some additional RPMs and it was complaining that subscription manager was not registered, but centos8 repo kicked in , it took a little longer but still was under 1 min, and push to quay.io was under 1 min. Again wonderful. So I see no reason why this wouldn't work. So my smallfile_wrapper/Dockerfile looked like:

FROM quay.io/bengland/snafu:latest

ADD https://api.github.com/repos/distributed-system-analysis/smallfile/git/refs/heads/master /tmp/bustcache
RUN git clone https://github.com/distributed-system-analysis/smallfile /opt/smallfile
RUN ln -sv /opt/smallfile/smallfile_cli.py /usr/local/bin/
RUN ln -sv /opt/smallfile/smallfile_rsptimes_stats.py /usr/local/bin/

and my fio Dockerfile looked like:

FROM quay.io/bengland/snafu:latest
COPY snafu/image_resources/centos8.repo /etc/yum.repos.d/centos8.repo
RUN dnf install --nodocs -y --enablerepo=centos8 make gcc libaio zlib-devel libaio-devel
RUN dnf clean all
RUN curl -L https://github.com/axboe/fio/archive/fio-3.27.tar.gz | tar xzf -
RUN pushd fio-fio-3.27 && ./configure --disable-native && make -j2
RUN ln -sv /fio-fio-3.27/fio /usr/local/bin/
COPY . /opt/snafu

BTW if you take out --depth 1 from git clone, then you can easily fetch a branch for debugging, so I prefer no --depth 1.

bengland2 commented 3 years ago

hopefully the base image is resolved by benchmark-wrapper PR #319 and we can rapidly convert benchmarks to use that, if I understand Ryan correctly.

bengland2 commented 3 years ago

I found a way to speed up fs_drift_wrapper/Dockerfile that results in very fast image rebuild -- it appears that podman caches layers of image based on the order of steps in the Dockerfile, and by putting the RPM install and the pip install steps first, we always get this stuff cached and so the only thing it has to actually change is to clone fs-drift and copy benchmark-wrapper tree to the image, this runs in a few seconds. The Dockerfile I'm using is:

FROM registry.access.redhat.com/ubi8:latest

RUN dnf install -y --nodocs git python3-pip
RUN ln -s /usr/bin/python3 /usr/bin/python
COPY setup.cfg setup.py version.txt /opt/snafu/
RUN pip3 install -e /opt/snafu/
RUN git clone https://github.com/parallel-fs-utils/fs-drift /opt/fs-drift
RUN ln -sv /opt/fs-drift/fs-drift.py /usr/local/bin/
RUN ln -sv /opt/fs-drift/rsptime_stats.py /usr/local/bin/
COPY . /opt/snafu/
learnitall commented 2 years ago

Working on integration into our build system (have a couple of tweaks to make to get this to work) and then will start migrating Dockerfiles into using the base image.

bengland2 commented 2 years ago

@learnitall update?

learnitall commented 2 years ago

Hey Ben, been distracted with some other work that has come up since Joe left. I have experimented with creating a base image using the ONBUILD syntax and I think it's the way to go. The current image that I have requires a specific distro to be used by each wrapper image, however using ONBUILD and packing snafu correctly, I can create a base image compatible with any FROM image a wrapper needs to use. See here for an example base image, and here for example usage.

Still working on this issue, it's just been a low priority for me here with some other tasks taking my attention.