OpenDroneMap / ODM

A command line toolkit to generate maps, point clouds, 3D models and DEMs from drone, balloon or kite images. 📷
https://opendronemap.org
GNU Affero General Public License v3.0
4.86k stars 1.1k forks source link

Dockerfile Feedback #781

Closed sberryman closed 6 years ago

sberryman commented 6 years ago

FYI ONLY!

I noticed a recent commit to the Dockerfile and was quite interested. Then I started digging in to see what the Dockerfile was actually doing.

There are some glaring issues with the below section based on docker caching of layers. When reviewing the best practices for docker images, They state Always combine RUN apt-get update with apt-get install in the same RUN statement. The reason is that any apt-get install on subsequent layers are using the cached apt-get update from previous layers. Meaning if you re-run docker build on a machine that has already built ODM and don't make any changes to the layers prior to apt-get update you'll be using old cached lists and installing (potentially) old versions.

https://github.com/OpenDroneMap/OpenDroneMap/blob/96be8c43aa165e5ce727dd6262b88bea106b5c88/Dockerfile#L6-L12

Then there is the issue of cleanup of APT. Running cleanup on a separate layer doesn't remove the intermediate layers which have the cached data. So you either need to remove the cache lists in the same RUN command as you are doing apt-get update and apt-get install or just remove it from the dockerfile as it is useless. https://github.com/OpenDroneMap/OpenDroneMap/blob/master/Dockerfile#L58-L59

dakotabenjamin commented 6 years ago

This used to be the case, I'm not sure why it changed.

sberryman commented 6 years ago

Sorry, I'm not sure what you mean by "this used to be the case". Are you talking about the removal of APT cache files? I'm pretty sure you only used to be able to get rid of intermediate layers by flattening the image, not very many people do that though.

Update: technically docker calls it -squash so hopefully I didn't confuse you with 'flatten'

sberryman commented 6 years ago

Any idea why libvtk6-dev header files are getting installed on line 56? They are already included in the massive dependency install phase on line 15.

sberryman commented 6 years ago

Want me to take a stab at rewriting it? How condensed do you want the final image? We can run almost everything in a single layer or I can try and split it out a bit. It would help to know what you tend to update. Does the build time of the image matter?

I could install the majority of deps near the top and then install build-essential, cmake, perform build and clean up files on another layer.

This would leave you with apt dependencies in the 1st layer, pip dependencies in 2nd, env & copy on a few more layers and finally build and cleanup on the final layer

sberryman commented 6 years ago

I played around with the Dockerfile a bit, here is docker history for a local build of ODM (96be8c4)

IMAGE               CREATED                  CREATED BY                                      SIZE                COMMENT
f4e1f13d0075        Less than a second ago   /bin/sh -c #(nop)  ENTRYPOINT ["python" "/co…   0B                  
69e075ebbf62        Less than a second ago   /bin/sh -c rm -rf /code/SuperBuild/download …   0B                  
8712d7541ebc        Less than a second ago   /bin/sh -c apt-get clean && rm -rf /var/lib/…   0B                  
64bb9330591e        Less than a second ago   /bin/sh -c apt-get install -y libvtk6-dev       299MB               
d6fe276c3611        Less than a second ago   /bin/sh -c apt-get -y remove libgl1-mesa-dri…   3.23MB              
be27339e4c81        Less than a second ago   /bin/sh -c cd SuperBuild && mkdir build && c…   2.46GB              
8695805db5a7        28 minutes ago           /bin/sh -c #(nop) COPY file:d2dd9312ac40f85c…   6B                  
fd3d44ede059        28 minutes ago           /bin/sh -c #(nop) COPY file:7c13e4e876f9f1e8…   2.31kB              
478225373e32        28 minutes ago           /bin/sh -c #(nop) COPY file:de9972b7e00aab74…   3.91kB              
77e3f134b1bd        28 minutes ago           /bin/sh -c #(nop) COPY dir:0096577dd9215d190…   18.5kB              
4a99d26ffff9        28 minutes ago           /bin/sh -c #(nop) COPY dir:d612083a96627f68d…   100kB               
8ffc941bc51f        28 minutes ago           /bin/sh -c #(nop) COPY file:48c2781c0c3cef57…   292B                
494c9c556426        28 minutes ago           /bin/sh -c #(nop) COPY file:91a496b3597da96a…   1.57kB              
fc361d40a3e0        28 minutes ago           /bin/sh -c #(nop) COPY dir:68c81018f4833e263…   453kB               
38efbc40404a        28 minutes ago           /bin/sh -c #(nop) COPY dir:e0e90f8904a9ed21a…   74.8kB              
91c2051df250        28 minutes ago           /bin/sh -c #(nop) COPY dir:779e102780c9e2f15…   272kB               
a66b4bf113bf        28 minutes ago           /bin/sh -c #(nop) COPY file:0ee2b2f83c2c5f46…   4.91kB              
fe3fc6309d03        28 minutes ago           /bin/sh -c #(nop) COPY file:449f3516ab95493f…   547B                
cb47c53f7025        28 minutes ago           /bin/sh -c #(nop) COPY file:2475dfdfd7a6111c…   654B                
a3d14dbfe747        28 minutes ago           /bin/sh -c #(nop) WORKDIR /code                 0B                  
5035510c7737        28 minutes ago           /bin/sh -c mkdir /code                          0B                  
9858aeeed408        28 minutes ago           /bin/sh -c #(nop)  ENV LD_LIBRARY_PATH=:/cod…   0B                  
53a7d71dc731        28 minutes ago           /bin/sh -c #(nop)  ENV PYTHONPATH=:/code/Sup…   0B                  
c8ec01695612        28 minutes ago           /bin/sh -c #(nop)  ENV PYTHONPATH=:/code/Sup…   0B                  
2c91acaea5aa        28 minutes ago           /bin/sh -c pip install -U PyYAML exifread gp…   353MB               
5a7e8746e9ae        30 minutes ago           /bin/sh -c pip install setuptools               2.85MB              
96b6e0695ffb        30 minutes ago           /bin/sh -c pip install --upgrade pip            8.22MB              
40b6d629abeb        30 minutes ago           /bin/sh -c apt-get remove libdc1394-22-dev      0B                  
91f8de57c68f        30 minutes ago           /bin/sh -c apt-get install --no-install-reco…   1.67GB              
6fd16cdd33fe        36 minutes ago           /bin/sh -c apt-get update -y                    686kB               
d4d0361edc78        36 minutes ago           /bin/sh -c add-apt-repository -y ppa:george-…   6.27kB              
076942099b74        36 minutes ago           /bin/sh -c add-apt-repository -y ppa:ubuntug…   15.9kB              
f6ae1c8f8633        36 minutes ago           /bin/sh -c apt-get install software-properti…   0B                  
63771a799817        36 minutes ago           /bin/sh -c apt-get update -y                    43.1MB              
d0b382d3989c        5 days ago               /bin/sh -c #(nop)  ENV DEBIAN_FRONTEND=nonin…   0B                  
166cfc3f6974        6 weeks ago              /bin/sh -c #(nop) CMD ["/sbin/my_init"]         0B                  
<missing>           6 weeks ago              /bin/sh -c #(nop) ENV DEBIAN_FRONTEND=telety…   0B                  
<missing>           6 weeks ago              /bin/sh -c /bd_build/prepare.sh &&  /bd_buil…   97.3MB              
<missing>           6 weeks ago              /bin/sh -c #(nop) COPY dir:e7a5eda59d878c69c…   39.4kB              
<missing>           6 weeks ago              /bin/sh -c #(nop) MAINTAINER Phusion <info@p…   0B                  
<missing>           6 weeks ago              /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B                  
<missing>           6 weeks ago              /bin/sh -c mkdir -p /run/systemd && echo 'do…   7B                  
<missing>           6 weeks ago              /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$…   2.76kB              
<missing>           6 weeks ago              /bin/sh -c rm -rf /var/lib/apt/lists/*          0B                  
<missing>           6 weeks ago              /bin/sh -c set -xe   && echo '#!/bin/sh' > /…   745B                
<missing>           6 weeks ago              /bin/sh -c #(nop) ADD file:a3344b835ea6fdc56…   112MB               

My changes to the Dockerfile WIP

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
c2d1bb8dc4dc        3 minutes ago       /bin/sh -c #(nop)  ENTRYPOINT ["python" "/co…   0B                  
0d35e01de91a        3 minutes ago       /bin/sh -c apt-get update &&     apt-get ins…   627MB               
1e1467300432        25 minutes ago      /bin/sh -c #(nop) COPY file:d2dd9312ac40f85c…   6B                  
ea975646156d        25 minutes ago      /bin/sh -c #(nop) COPY file:7c13e4e876f9f1e8…   2.31kB              
729cfdfc1133        25 minutes ago      /bin/sh -c #(nop) COPY file:de9972b7e00aab74…   3.91kB              
2fe519a788ee        25 minutes ago      /bin/sh -c #(nop) COPY dir:0096577dd9215d190…   18.5kB              
2e0d942637ba        25 minutes ago      /bin/sh -c #(nop) COPY dir:d612083a96627f68d…   100kB               
b0eff52163b5        25 minutes ago      /bin/sh -c #(nop) COPY file:48c2781c0c3cef57…   292B                
614c90583c89        25 minutes ago      /bin/sh -c #(nop) COPY file:91a496b3597da96a…   1.57kB              
bcdb76b0a87b        25 minutes ago      /bin/sh -c #(nop) COPY dir:68c81018f4833e263…   453kB               
b875347ab866        25 minutes ago      /bin/sh -c #(nop) COPY dir:e0e90f8904a9ed21a…   74.8kB              
2efbadd7bd63        25 minutes ago      /bin/sh -c #(nop) COPY dir:779e102780c9e2f15…   272kB               
fee9b98779e3        25 minutes ago      /bin/sh -c #(nop) COPY file:0ee2b2f83c2c5f46…   4.91kB              
028ee1152551        25 minutes ago      /bin/sh -c #(nop) COPY file:449f3516ab95493f…   547B                
35efeee52f01        25 minutes ago      /bin/sh -c #(nop) COPY file:2475dfdfd7a6111c…   654B                
2270283cdc8f        25 minutes ago      /bin/sh -c #(nop) WORKDIR /code                 0B                  
b0f841a6cba8        25 minutes ago      /bin/sh -c #(nop)  ENV PYTHONPATH=:/code/Sup…   0B                  
d8e212be8c7e        25 minutes ago      /bin/sh -c add-apt-repository -y ppa:ubuntug…   1.82GB              
d0b382d3989c        5 days ago          /bin/sh -c #(nop)  ENV DEBIAN_FRONTEND=nonin…   0B                  
166cfc3f6974        6 weeks ago         /bin/sh -c #(nop) CMD ["/sbin/my_init"]         0B                  
<missing>           6 weeks ago         /bin/sh -c #(nop) ENV DEBIAN_FRONTEND=telety…   0B                  
<missing>           6 weeks ago         /bin/sh -c /bd_build/prepare.sh &&  /bd_buil…   97.3MB              
<missing>           6 weeks ago         /bin/sh -c #(nop) COPY dir:e7a5eda59d878c69c…   39.4kB              
<missing>           6 weeks ago         /bin/sh -c #(nop) MAINTAINER Phusion <info@p…   0B                  
<missing>           6 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B                  
<missing>           6 weeks ago         /bin/sh -c mkdir -p /run/systemd && echo 'do…   7B                  
<missing>           6 weeks ago         /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$…   2.76kB              
<missing>           6 weeks ago         /bin/sh -c rm -rf /var/lib/apt/lists/*          0B                  
<missing>           6 weeks ago         /bin/sh -c set -xe   && echo '#!/bin/sh' > /…   745B                
<missing>           6 weeks ago         /bin/sh -c #(nop) ADD file:a3344b835ea6fdc56…   112MB               

The 1.82 GB of dependencies (APT & pip) is quite large as well as the SuperBuild at 627 MB. I cleaned up a bunch of redundant dependencies (listed more than once on various lines) and moved cleanup of APT and temp files to the RUN commands that create them. I'm assuming there is quite a bit more that I can clean up but I just don't know the project well enough.

Have you guys thought about breaking ODM into a bunch of smaller projects and/or containers? Make each container specific to its process. For instance, OpenSFM should have its own container and use queues, pub/sub or a custom http api to communicate. You are already using docker so having people use compose to build and start a collection of containers is not asking much of the end user. It would also help make the project more (horizontally) scaleable while making it MUCH easier to reduce dependencies down to bare minimum. It is also a bit unnerving that pip dependencies are not version specific and you are not tagging docker builds with a version.

Update: I can open PR if you wan't

smathermather commented 6 years ago

@sberryman -- pull requests always welcome!

sberryman commented 6 years ago

@smathermather I'm not sure it made a difference on download size, I know I at least cleaned up redundancies.

I'm running a job on the docker image after all the changes to ensure it still works. Do you guys have a pre-defined test(s) you use to verify the results of changes to the Dockerfile or underlying dependencies?

smathermather commented 6 years ago

Our tests for ODM itself (as opposed to node-OpenDroneMap or WebODM) are embarrassingly lacking, but usually Dakota or Piero run pull requests through the paces manually.

PeterSprague commented 6 years ago

I have also completely rewritten the dockerfile to conform closer to best practices. Pulled all of Superbuild component builds to seperate builds using latest software so can debug components that were failing and killed Superbuild. Uses debain stretch as base image. Still needs tuning to shrink the image size, but builds without error and produces the orthos I need.

What would be the advantage of pulling all the components apart into their own images beyond updates? See it as complicating the situation.

Can submit a PR of various code bits including dockerfile, but it is strictly for Docker use aka no native install, and nukes the current way the project does things with Superbuild. Requires tuning various other hard coded paths in some of the code.

Here is the gist: https://gist.github.com/PeterSprague/a831e5109eb62fe8a6ab7eef92d2214b

sberryman commented 6 years ago

Very interesting @PeterSprague, I don't know the project well enough yet to split it up but it seems like you have at least broken the build out. Are you building each component/sub-project of superbuild as a separate layer? Did you even bother building dependencies you didn't use like ORB_SLAM2? (I guess that could be a bad assumption you didn't use SLAM.)

Based on my experience converting projects at Golf Channel to Docker, we hit a lot of walls and confusion until we broke down the project into several different isolated containers. This let us release and version control each container without having to wait for the entire team to assemble a working build. Scaling became much easier, images got smaller and the overall velocity of development increased dramatically.

The only problem with separate containers is relying on an orchestration framework such as compose, swarm, kubernetes, etc. I can't imagine the number of github issues that would be created by people struggling to get it running.

PeterSprague commented 6 years ago

What was the precursor to this was the desire to use Cuda or OpenCL to speed things up. To make that happen, I need an ODM base image that uses newer/newest OpenCV.

Each piece of software builds on a number of layers, and I have not made any real attempt to compact the image. I diverge from the latest ODM-master in a number of places because I was getting numerous errors and finally decided to nuke the SuperBuild because I was having a hard time debugging issues.

Debian Stretch was chosen over Ubuntu 16.04 because of issues around older versions of software like gdal, and I need access to an arm64 base image with a common set of software for my use case

Thinking about it, being able to scale out some of the components could be a real advantage in a higher volume production situation. Food for thought down the line.

PeterSprague commented 6 years ago

"It is also a bit unnerving that pip dependencies are not version specific"

I was getting "import cv2" --> numpy errors because of this. Also apt-get install various python dependencies mostly python-gdal, and then pip install them in latest odm-master added to the mess.

Be glad to PR my versions of things, but being new to team software development don't want to step on toes with my fork of the ODM code base.

sberryman commented 6 years ago

Did you make any progress on the Cuda or OpenCL front? Did you switch to using FAST or ORB_GPU for feature detection and matching?

What other areas are you attempting to use Cuda or CL to speed up the pipeline?

Also with you on submitting a PR for the dockerfile, this was my way to start a discussion. I was hoping to figure out what @dakotabenjamin, @smathermather and @pierotofy had in mind for the roadmap. It would be interesting to know how many people use ODM with/without docker. Personally I don't know why anyone would NOT use docker but I hopped on that bandwagon a while ago.

PeterSprague commented 6 years ago

Issue on the PR is it is more than just a dockerfile, had to tweak other parts of code some so it worked together. Would diverge from master.

Maybe the project could/should look at a Docker branch and a native branch, though that could get messy fast. Leave to others with more experience to give some guidance.

I concur on using Docker except use iocage for BSD environment.

PeterSprague commented 6 years ago

RE: Cuda Intend on pursuing over the next few months. Further complicated by having to port the project to arm64 for my use case.

Original idea was to focus on OpenSfM --> OpenCV, hence moving to OpenCV-v3.4 because it has OpenCL seamlessly included, and more readily supports Cuda.

dakotabenjamin commented 6 years ago

Having s separate docker and native branch sounds like more maintenance than it's worth, to me. I'd like to see a more thorough explanation of why that would benefit the project more than adjusting the configuration on the master branch to support both.

PeterSprague commented 6 years ago

Only pushed it up as a possibility because I have found the current master structure with SuperBuild and choice of base image not working well for my use with Docker. I also had to tune some of the ODM code to revise hard-coded file/program locations.

As explained above, I needed/wanted to move to the latest software, and found continuing to utilize the SuperBuild format was causing hard to track bugs when building a new image. Docker builds need to be moved to a more micro-service architecture vs monolithic as now. This will require tools like multistage builds, Docker compose link, etc.

Not sure how or if this work should be rolled back into the ODM master, or I can continue with an ODM fork to address ODM Docker usage including Cuda. If its rolled into the ODM master, wouldn't that be a pivot for the project as it stands?

dakotabenjamin commented 6 years ago

I'd be interested in seeing how we could improve the install process as a whole. It should be a simpler, more robust process, but I'm not exactly sure how that would look.

PeterSprague commented 6 years ago

Really comes back to use cases right?

My primary interest is edge based field use, with server back-end. That leads me to Docker and a mix of amd64 and arm64v8 capabilities. Because of this I choose to use Debian as my ODM base image over Ubuntu-16.04. I found that numerous packages were missing or different between the amd64 and arm64v8 repos in Ubuntu

I'm thinking that at a minimum, to push all the dependencies and supporting software into separate image(s), then strip down the ODM code to its minimum.

I think I understand why the SuperBuild approach was used, but think it is redundant in a docker container. Nuked it for my usage after fighting with impossible to find source build bugs

PeterSprague commented 6 years ago

Will move this more open-ended conversation over to the forum

dakotabenjamin commented 6 years ago

This issue was moved to http://community.opendronemap.org/t/dockerfile-feedback/369