geohot / qira

QEMU Interactive Runtime Analyser
MIT License
3.95k stars 470 forks source link

Docker stages and test integration #209

Closed jk- closed 4 years ago

jk- commented 5 years ago

This streamlines the docker build and test process.

Overview:

Additional Test comments: I can update travis to use docker to run tests, but it would rebuild the base image on every test. Only way to resolve this is to create a docker hub public repo for qira so the base image can be cached. Let me know since it currently does that anyway with the install.sh script and we can figure out the remote image repository after.

There are now two stages to building docker, the builder stage and the qira stage.

./build.sh to run through all stages and build out both docker images. add base or qira as an argument to build a specific stage and image. base is the builder stage.

I upgraded the base docker image to python3 and removed a few packages that were no longer required. I had to keep python-dev because qiradb.cpp was trying to use Python.h from python 2. If we can figure out a fix we can remove the python-dev package.

Here was the error:

Screen Shot 2019-03-24 at 10 51 19 AM

I also had to use python3 when running the static2 tests because it could not find the capstone module when importing static2 (installed via pip3 and not pip2).

To test all you have to do is run: ./test.sh and it will rebuild only the qira image (takes a few seconds). This will install pip requirements and execute the test runners with the latest local changes.

I believe that's all. I can update the README if required.

joekohlsdorf commented 5 years ago

There is no reason for a two stage build, you should just mount the code instead.

Dockerfiles don't need to be moved, you can do docker build -f docker/Dockerfile .

build.sh and test.sh should really be replaced by a Makefile

jk- commented 5 years ago

There are huge use-cases for multi-stage builds and primary reasons why they are heavily utilized. Especially if qira ever has plans to have some cloud based integration. There are no drawbacks to multi-stage builds besides people not understanding it.

  1. Building out the base image with all the dependencies allows for a secondary stage to be built almost instantaneously. By eventually utilizing an image repository you can cache base builds (dependencies). I didn't go the extra step with my docker stages because I didn't want to convolute iterations. It will solve two eventual problems, speed of test execution and the ability to scale a service.
  2. Mounting the code is done through either docker-compose or a script that creates the containers. That's the point of the VOLUME line. The mounting isn't done on build. You can simply mount your code with -v=$(pwd):/qira if required (a docker compose or with the docker run -v). Because of the multistage process, the difference in time savings is simply the difference between copying your code to a container and not. But I do agree, it should be apart of the development process.
  3. I moved the docker files out of docker/* because I didn't want to confuse the build context (that's the dot). Knowing the initial context is a lot easier, especially since the dockerfile was now expecting COPY commands.
  4. build and test replaced by makefile.. i totally agree, I was going to do it, but didn't want to make another process and wanted my PR small.

Eventually you could create a container that acts like a binary and mounts your binary to the container and you wouldn't need any install scripts at all. The drawback is the container would only be able to execute binaries compatible with it. I thought this was the plan all along with using Docker. If that's not the plan, its much better to just get rid of docker all together.

joekohlsdorf commented 5 years ago

It is not necessary to have Docker Compose to mount the code, this is a standard Docker functionality. Therefore doing a multistage build for the local environment is not necessary. I wasn't saying multistage builds are bad but in this case it is just unnecessary. Mounting saves even more time by not building at all. It also doesn't litter your local Docker image store by rebuilding whenever you want to run your code. And in case you want to push the image to a registry your first stage already has a copy of the code in it. So no multistage necessary for that either.

jk- commented 5 years ago

It is not necessary to have Docker Compose to mount the code, this is a standard Docker functionality.

That's what I said. You can use docker-compose or you can use it with docker run -v to bind a mount volume. You don't do that in the build process.

Therefore doing a multistage build for the local environment is not necessary.

I'm not sure I understand. The purpose of a docker container is to facilitate, among other things, the encapsulation of dependencies. If you are simply mounting your entire repository and executing dependency installations on your own machine, you don't need docker, at all. The multi-stage part is encapsulating dependencies through the base builder image and using that as a cached starting point for application iterations.

Mounting saves even more time by not building at all.

If you want to mount for dev you can do it on the non-dependency image, that's why the VOLUME /qira is supplied. We can change that and make it super duper fast.

And in case you want to push the image to a registry your first stage already has a copy of the code in it. So no multistage necessary for that either.

The implementation for a multi-stage process in a production environment has a builder image (dependency) cached on an ECR. It facilitates extremely fast deployments by not having to recompile and install dependencies (you can see how long Mr. Hotz was waiting for tests to pass on his recent commits), and during the test phase, either through code build or code pipeline, you can forward pass an eight digit head of the app commit to create a tag, that is also what you push to the ECR with your code in it to prevent recompiling of dependencies.

Fast and really fast CI/CD.

geohot commented 5 years ago

I like the direction of this. I think there's a lot more that can go in the .dockerignore (is that copy fast?), and really the Travis CI test, install.sh, and docker should all be unified. And can the Dockerfile stay in docker/?

jk- commented 5 years ago

The copy is fast, especially if we fine tune what is ignored. For development, you can mount your local qira onto the container.

We should consider creating a public image for the build image. Docker hub is free for 1 repository and I believe Amazon ECR has a free tier. This allows the image to be cached locally (after downloading) so the build, test and development side doesn't have to wait for building out the tracers every run.

We can keep all the docker related stuff in docker/ and unify the build process on Travis CI. I can update the branch to do accomplish all of these tasks.

jk- commented 5 years ago

@geohot

Do you want to create a public image repo on dockerhub so we can store the base image? This will facilitate caching for both "production" (whatever that becomes) and testing. If its out of scope for this ticket or you are not interested, let me know.

Also, is the goal to have the container act as a binary for qira? We could set it up so qira executes the docker container and mounts the binary. You can access the web interface through exposing ports on the container. I'm not sure how people are using the docker container now, maybe the same way, but we could be a little more user friendly in setting it up.

I guess the smallest task to do is to start from master, improve some of the container build processes and unify the Travis and install processes.

alehatsman commented 5 years ago

What a story Mark?

Agree with @joekohlsdorf

You can archive speed and cache without multi-stage. Docker's layers concept of caching will be enough. Just do things in the right order so that docker can cache steps.

In your example @jk- , you install pip and apt deps two times.

If this app was compiled to binary only, including all static assets, then it would perfectly make sense to build it first, then copy the binary to the scratch image and use that light image everywhere.

https://github.com/geohot/qira/pull/216

jk- commented 5 years ago

I'm trying to address two things that I believe are being overlooked. Testing speed, and a cloud based version of Qira. Both of these processes will require an intermediary build process, unless you want to wait for 7 minutes to run a container.

The reason pip is installed twice, is once for a build container (to facilitate caching of a build image) and another image that facilitates development (as you will be required to update pip dependencies). We already talked about the ability to mount onto the development container to prevent that.

For example, your PR #216 ran for 7 min and 14 sec.

Screen Shot 2019-04-17 at 7 04 47 AM

A multi-stage build, when properly using a container register, will be able to cache build dependencies (qira/build:latest).

The current code in this PR to facilitate that, is wrong, primarily because the base image is not a remote register endpoint and the fact that the origin branch has changed so much.

alehatsman commented 5 years ago

@jk-

The run time that you are referring to is run time of raw execution of the installation process.

Ok. My point is, while intermediate layer caching is disabled in Travis, if it is done right in your Dockerfile, you can do docker pull qira:latest, that will download all intermediate layers of your image to local travis's docker process. And then run a build of docker image.

Layers with dependencies will be there and docker will reuse them. Apt, pip etc. If you have changed requirements.txt it will detect it and install it again.