Severe Docker 1.12.1 performance regression with DB2 images (~10x slower)

malduarte commented 8 years ago

Expected behavior

Performance equal or better than 1.12.0

Actual behavior

Docker 1.12.1 is much slower on pretty much all DB2 activity. Not sure if it is DB2 only, but I'm only experiencing it on DB2. For example empty database creation takes ~24 seconds on 1.12.0 but on 1.12.1 it takes 10 times more, around ~253 seconds

docker stats seems to indicate lower cpu usage and lower I/O activity on 1.12.1

Information

Di276C02C2-7AEC-4A1B-9F2C-9E6DAD3241CB
Create a Dockerfile with the following content

FROM ibmcom/db2express-c:latest

ENV DB2INST1_PASSWORD=password
ENV LICENSE=accept
USER root
RUN /bin/bash -c 'su - db2inst1 -c "db2start && db2 create database test"'

Steps to reproduce the behavior

Install 1.12.0
Pull official db2 image docker pull ibmcom/db2express-c
Make sure you're running 1.12.0 docker --version should output Docker version 1.12.0, build 8eab29e
Create an image with the docker file above with the time command time docker build -t test . Consider running docker stats while it is running
Record the output
Remove the recently created image docker rmi <imageid>
Install 1.12.1
Make sure you're running 1.12.1 docker --version should output Docker version 1.12.1, build 6f9534c
Create an image with the attached docker file with the time command time docker build -t test .. Consider running docker stats while it is running
Compare the times.

12:05 $ docker --version
Docker version 1.12.0, build 8eab29e
✔ ~/docker_bench
12:05 $ time docker build -t test .
Sending build context to Docker daemon 66.56 kB
Step 1 : FROM ibmcom/db2express-c:latest
 ---> 7aa154d9b73c
Step 2 : ENV DB2INST1_PASSWORD password
 ---> Using cache
 ---> 8abd69a10768
Step 3 : ENV LICENSE accept
 ---> Using cache
 ---> 271672f5dc75
Step 4 : USER root
 ---> Using cache
 ---> 3f3ab0f72686
Step 5 : RUN /bin/bash -c 'su - db2inst1 -c "db2start && db2 create database test"'
 ---> Running in bd1c14701b68
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory
SQL1063N  DB2START processing was successful.
DB20000I  The CREATE DATABASE command completed successfully.
 ---> 6bb79e3c1592
Removing intermediate container bd1c14701b68
Successfully built 6bb79e3c1592

real    0m23.646s
user    0m0.011s
sys 0m0.014s

12:10 $ docker --version
Docker version 1.12.1, build 6f9534c
✔ ~/docker_bench
12:11 $ time docker build -t test .
Sending build context to Docker daemon 66.56 kB
Step 1 : FROM ibmcom/db2express-c:latest
 ---> 7aa154d9b73c
Step 2 : ENV DB2INST1_PASSWORD password
 ---> Using cache
 ---> 8abd69a10768
Step 3 : ENV LICENSE accept
 ---> Using cache
 ---> 271672f5dc75
Step 4 : USER root
 ---> Using cache
 ---> 3f3ab0f72686
Step 5 : RUN /bin/bash -c 'su - db2inst1 -c "db2start && db2 create database test"'
 ---> Running in ebbf3069fb53
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory
SQL1063N  DB2START processing was successful.
DB20000I  The CREATE DATABASE command completed successfully.
 ---> 032da4506157
Removing intermediate container ebbf3069fb53
Successfully built 032da4506157

real    4m23.451s
user    0m0.010s
sys 0m0.015s

dave-tucker commented 8 years ago

ping @dsheets

justincormack commented 8 years ago

I am going to test if this is an issue just on Linux, as there is not much visible Docker for Mac specific stuff here (eg no volumes or network)

malduarte commented 8 years ago

@justincormack I didn't try Linux, but I did try with docker 1.12.1 on docker-machine (virtualbox driver). Works fine there for this particular test, virtually the same time as docker 1.12.0 for mac

justincormack commented 8 years ago

@malduarte yes it may be Docker for Mac specific, but it is not obvious why, it could be for example related to the Linux runtime or something...

justincormack commented 8 years ago

Sounds like @djs55 has an idea of what may have caused this

justincormack commented 8 years ago

So it appears this may be because in previous versions it was not fully flushing blocks on fsync (and nor was Docker toolbox), so when we fixed this to add the full flush through to the OSX filesystem it slowed down. This would probably only be visible in database workloads where fsync is being used a lot.

JanBednarik commented 8 years ago

I have similar issue with PostgreSQL. It's approximately 10x slower after Docker for Mac upgrade to 1.12.1

malduarte commented 8 years ago

@justincormack : That makes sense. I understand current sync behaviour should be the default (event with the slow performance). Correctness should trump fast.

That being said Is there some kind of configuration option to disable fsync? It would be a boom for ephemeral container use cases. Example: container is spun up for a CI build and then thrown away.

djs55 commented 8 years ago

Thanks very much for the easy to follow repro instructions. I ran 2 experiments:

unmodified 1.12.1: time reports 5 minutes
remove the call to fsync: time reports 30 seconds

It seems the VM calls the virtio-blk implementation of flush in hyperkit approximately 25000 times. Each of these is implemented currently by an fsync(F_FULLFSYNC) to avoid writes being partially written or re-ordered over a power loss (as recommended by the Apple docs). Unfortunately each fsync(F_FULLFSYNC) seems to take about 10ms which accounts for the slowdown.

I agree that there are important use-cases where data persistence is less important than throughput, especially on developer setups or CI builds where containers are ephemeral. I'll investigate the possibility of a configuration option.

malduarte commented 8 years ago

ooops, closed accidentally. Sorry folks

timc13 commented 8 years ago

+1

djs55 commented 8 years ago

I've added a configuration option to the master branch and it should be released in the next beta (27) later this week. It's not wired up to the UI yet. I'll post instructions explaining how to change the setting when the beta is ready.

Multiply commented 8 years ago

@djs55 Is it the flush change mentioned in the changelog for 1.12.2-rc1-beta27 (build: 12496)? If so, I'd love to try it out.

djs55 commented 8 years ago

Thanks for the reminder -- here's how to activate the flush change on beta 27:

My "About Docker" shows:

Version 1.12.2-rc1-beta27 (build: 12496)
179c18cae7

I typed into a terminal:

$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ git reset --hard
HEAD is now at cafabd0 Docker started 1475137831
$ cat com.docker.driver.amd64-linux/disk/full-sync-on-flush 
true
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush 
$ git add com.docker.driver.amd64-linux/disk/full-sync-on-flush 
$ git commit -s -m "Disable flushing"
[master dc32fcc] Disable flushing
 1 file changed, 1 insertion(+), 1 deletion(-)

This will cause the VM to reboot which takes about 30s or so. Afterwards flush should be fast (but the data won't have hit the physical disk, which means it could be lost if there was a sudden power loss)

Let me know if this works for you.

(edited to fix missing disk typo)

mping commented 8 years ago

@djs55 probably a typo in echo false > com.docker.driver.amd64-linux//full-sync-on-flushright? it's missing the disk part I guess

malduarte commented 8 years ago

@djs55 Awesome! Works just fine! There's a small typo in your instructions.

$ echo false > com.docker.driver.amd64-linux//full-sync-on-flush should be $ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush

JanBednarik commented 8 years ago

It works, thank you!

djs55 commented 8 years ago

Glad to hear it works, sorry for the typo! (I really thought I had cut and pasted that properly, oh well)

We're considering adding some kind of "experimental" or "advanced" configuration options to the UI along the lines of about:config in firefox which could make this kind of thing easier in future.

jbrinley commented 8 years ago

The file at com.docker.driver.amd64-linux/disk/full-sync-on-flush did not exist for me, so I had to create it.

$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ git reset --hard
HEAD is now at 313b5d5 Docker started 1475165023
$ cat com.docker.driver.amd64-linux/disk/full-sync-on-flush
cat: com.docker.driver.amd64-linux/disk/full-sync-on-flush: No such file or directory
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush
-bash: com.docker.driver.amd64-linux/disk/full-sync-on-flush: No such file or directory
$ mkdir -p com.docker.driver.amd64-linux/disk
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush
$ git add com.docker.driver.amd64-linux/disk/full-sync-on-flush
$ git commit -s -m "Disable flushing"
[master a35bdea] Disable flushing
 1 file changed, 1 insertion(+)
 create mode 100644 com.docker.driver.amd64-linux/disk/full-sync-on-flush

Seems to be working great now. Thanks!

mingp commented 8 years ago

We had an issue where we were seeing a ~10x performance regression on specific write-heavy disk IO workloads. In particular, our backend automated testing setup (Python, Django, etc.) would reset (MySQL) database state in between individual test cases by truncating all tables in the test database. After installing a recent 1.12.1 release, we saw the delay of this operation spike up from < 5 seconds to ~ 1 minute.

After installing the latest 1.12.1 beta and applying the above suggested workaround, the delay is back down to where it was originally. Seems to be working great again.

Thank you all for your help.

argent-smith commented 8 years ago

Tried this workaround: it didn’t help :(

argent-smith commented 8 years ago

Can we somehow use NFS mounts in this docker «machine»?

2016-09-29 23:24 GMT+03:00 Ming Pan notifications@github.com:

We had an issue where we were seeing a ~10x performance regression on specific write-heavy disk IO workloads. In particular, our backend automated testing setup (Python, Django, etc.) would reset (MySQL) database state in between individual test cases by truncating all tables in the test database. After installing a recent 1.12.1 release, we saw the delay of this operation spike up from < 5 seconds to ~ 1 minute.

After installing the latest 1.12.1 beta and applying the above suggested workaround, the delay is back down to where it was originally. Seems to be working great again.

Thank you all for your help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/docker/for-mac/issues/668#issuecomment-250580589, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbWKDB-u-IOdHwm6u1h6VZKH58nV3v-ks5qvB5ogaJpZM4KAZGm .

Yours truly, Pavel.

djs55 commented 8 years ago

@argent-smith could you describe what you tried (was it the same repro as described on this issue) and upload diagnostics via "Whale menu" -> "Diagnose & Feedback" -> "Diagnose & Upload" and quote the ID?

sunarun commented 8 years ago

I'm facing this problem while creating DB2 image on RHEL Server release 7.2 (Maipo) and docker version 1.12.1. If there is a fix for it, please share it with me.

docker info.txt

thehappycoder commented 8 years ago

Will downgrading to 1.12.0 work?

klausbadelt commented 8 years ago

Fix by @djs55 worked for me on several devboxes.

thehappycoder commented 8 years ago

Importing 3.3 GB sql file to mysql is now much faster with the custom workaround on docker beta.

But the webapp running in docker is still slower than on Linux boxes of my colleagues. Maybe I should try docker-machine and see the difference.

felixge commented 8 years ago

I'd like to also confirm that the workaround above. Before applying the workaround, I saw ~70 TPS on pgbench -i && pgbench -c 10 -T 5 vs ~2000 TPS when running postgres 9.6 natively. After applying the workaround, I'm seeing ~1500 TPS in docker now. More importantly, my application's test suite is no longer 10x slower in docker vs native.

Hopefully it will be possible to get similar fsync performance/behavior in docker as one gets natively in the future, which would be preferable to this workaround.

bobbypriam commented 8 years ago

Workaround works for me too. However, I still don't find it clear why the original behavior (full-sync-on-flush = true) is persisted. What are the use-cases in which we want to have it enabled? @djs55

mping commented 8 years ago

@bobbypriambodo I'm guessing its a mater of correctness. The use case is that if your container terminates abnormaly, you normally don't want to lose data, so fsync should be honored by default.

malduarte commented 8 years ago

@bobbypriambodo : Your whole docker setup will likely get corrupted if you do a hard reboot of your machine with flush disabled for example. It has happened to me :) You definitely don't want that by default.

klausbadelt commented 8 years ago

I updated to Version 1.12.2-beta28 (12906) Channel: Beta 71c4a001c2

The above workaround seems to have persisted after the update:

$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ cat com.docker.driver.amd64-linux/disk/full-sync-on-flush
false

Because the fix includes a git commit - did the update actually "perform completely"?
Seems very slow again

pcads commented 8 years ago

I have to change all of our Node containers gulp.watch interval from default to 1000ms. CPU usage down from 300% to 30%.

On the other hand, same containers run fine on Ubuntu with default gulp.watch interval.

ghinks commented 8 years ago

I too have a similar issue with slow response when using postgres containers.

mikeball commented 8 years ago

I'm having the same issues, how can we get the beta channel?

samoht commented 8 years ago

@mikeball: see https://docs.docker.com/docker-for-mac/

mikeball commented 8 years ago

Ok, latest beta and fixes from @djs55 made no difference in my case.

The scenario is building an uberjar with the .m2 directory as a shared volume, and placing the generated jar into a different shared volume, meaning it does read and write. Time taken normally ~2 seconds, but docker for mac is well over 8 minutes with 100% cpu usage. In my case its 167 times slower than normal.

felixge commented 7 years ago

FWIW this issue still requires the workaround from above as of Version 1.12.3-beta29.3 (13640).

I also had to reapply the workaround, presumably because beta 29 encourages doing a hard reset which I did.

sylus commented 7 years ago

I am also on Version 1.12.3-beta29.3 and I think it did help a bit but overall the performance is order of magnitudes slower then a native docker-machine + virtualbox workflow. For instance a drupal site with many modules and high cpu I/O takes about 1min 30 seconds to install on docker-machine + virtualbox. With the fix from this issue the install time took about 8 minutes to install. Without the full-sync-on-flush commit after around 15 minutes I gave up waiting. Happy we are making progress on this issue :)

lox commented 7 years ago

With the flush workaround and beta30, things are looking considerably faster in our mysql-5.6 based tests.

mashawan commented 7 years ago

We applied the workaround, which brings our application build time down from 20 minutes to 3 minutes. However developers are reporting their entire docker installation gets corrupt if their host machine looses power.

mashawan commented 7 years ago

Is the docker team planning to address the root issue?

warerwang commented 7 years ago

I'm having the same issues. How can I download the docker for mac 1.12.0? If you have, can you send this to my email. Thank you. warerwang@gmail.com

kortina commented 7 years ago

It seems like whenever I upgrade Docker for Mac to a new Beta, the fsync patch gets blown away.

I wrote a little script to run whenever I upgrade and need to re-apply:

https://gist.github.com/kortina/67ad6e40e40d5199c3507cdad0c9a12c

emaiax commented 7 years ago

Switched to stable and the problem persists. :sob:

Version 1.12.3 (13776)
Channel: Stable
583d1b8ffe

jesperronn commented 7 years ago

Thanks a lot @kortina. Really easy to apply your script like this:

URL="https://gist.githubusercontent.com/kortina/67ad6e40e40d5199c3507cdad0c9a12c/raw/docker-for-mac-fsync-perf-patch.sh"

curl -o- $URL | bash

rodrigoaguilera commented 7 years ago

I don't have to run the script anymore after upgrading docker. Are you experiencing this?

krknopp commented 7 years ago

We're still seeing slow performance with mysql on 1.13.0. The above fix doesn't seem to have an effect.

barat commented 7 years ago

@krknopp - are You sure, that You're affected exactly by this issue? Maybe You mount volume to host so You're affected by #77 as well?

krknopp commented 7 years ago

@barat I missed that one. I think you're right. I'll check that one out now. Thank you.

docker / for-mac