Closed malduarte closed 6 years ago
ping @dsheets
I am going to test if this is an issue just on Linux, as there is not much visible Docker for Mac specific stuff here (eg no volumes or network)
@justincormack I didn't try Linux, but I did try with docker 1.12.1 on docker-machine (virtualbox driver). Works fine there for this particular test, virtually the same time as docker 1.12.0 for mac
@malduarte yes it may be Docker for Mac specific, but it is not obvious why, it could be for example related to the Linux runtime or something...
Sounds like @djs55 has an idea of what may have caused this
So it appears this may be because in previous versions it was not fully flushing blocks on fsync
(and nor was Docker toolbox), so when we fixed this to add the full flush through to the OSX filesystem it slowed down. This would probably only be visible in database workloads where fsync
is being used a lot.
I have similar issue with PostgreSQL. It's approximately 10x slower after Docker for Mac upgrade to 1.12.1
@justincormack : That makes sense. I understand current sync behaviour should be the default (event with the slow performance). Correctness should trump fast.
That being said Is there some kind of configuration option to disable fsync
? It would be a boom for ephemeral container use cases. Example: container is spun up for a CI build and then thrown away.
Thanks very much for the easy to follow repro instructions. I ran 2 experiments:
time
reports 5 minutesfsync
: time
reports 30 secondsIt seems the VM calls the virtio-blk implementation of flush in hyperkit approximately 25000 times. Each of these is implemented currently by an fsync(F_FULLFSYNC) to avoid writes being partially written or re-ordered over a power loss (as recommended by the Apple docs). Unfortunately each fsync(F_FULLFSYNC)
seems to take about 10ms which accounts for the slowdown.
I agree that there are important use-cases where data persistence is less important than throughput, especially on developer setups or CI builds where containers are ephemeral. I'll investigate the possibility of a configuration option.
ooops, closed accidentally. Sorry folks
+1
I've added a configuration option to the master branch and it should be released in the next beta (27) later this week. It's not wired up to the UI yet. I'll post instructions explaining how to change the setting when the beta is ready.
@djs55 Is it the flush
change mentioned in the changelog for 1.12.2-rc1-beta27 (build: 12496)
? If so, I'd love to try it out.
Thanks for the reminder -- here's how to activate the flush
change on beta 27:
My "About Docker" shows:
Version 1.12.2-rc1-beta27 (build: 12496)
179c18cae7
I typed into a terminal:
$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ git reset --hard
HEAD is now at cafabd0 Docker started 1475137831
$ cat com.docker.driver.amd64-linux/disk/full-sync-on-flush
true
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush
$ git add com.docker.driver.amd64-linux/disk/full-sync-on-flush
$ git commit -s -m "Disable flushing"
[master dc32fcc] Disable flushing
1 file changed, 1 insertion(+), 1 deletion(-)
This will cause the VM to reboot which takes about 30s or so. Afterwards flush
should be fast (but the data won't have hit the physical disk, which means it could be lost if there was a sudden power loss)
Let me know if this works for you.
(edited to fix missing disk
typo)
@djs55 probably a typo in echo false > com.docker.driver.amd64-linux//full-sync-on-flush
right? it's missing the disk
part I guess
@djs55 Awesome! Works just fine! There's a small typo in your instructions.
$ echo false > com.docker.driver.amd64-linux//full-sync-on-flush
should be
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush
It works, thank you!
Glad to hear it works, sorry for the typo! (I really thought I had cut and pasted that properly, oh well)
We're considering adding some kind of "experimental" or "advanced" configuration options to the UI along the lines of about:config
in firefox which could make this kind of thing easier in future.
The file at com.docker.driver.amd64-linux/disk/full-sync-on-flush
did not exist for me, so I had to create it.
$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ git reset --hard
HEAD is now at 313b5d5 Docker started 1475165023
$ cat com.docker.driver.amd64-linux/disk/full-sync-on-flush
cat: com.docker.driver.amd64-linux/disk/full-sync-on-flush: No such file or directory
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush
-bash: com.docker.driver.amd64-linux/disk/full-sync-on-flush: No such file or directory
$ mkdir -p com.docker.driver.amd64-linux/disk
$ echo false > com.docker.driver.amd64-linux/disk/full-sync-on-flush
$ git add com.docker.driver.amd64-linux/disk/full-sync-on-flush
$ git commit -s -m "Disable flushing"
[master a35bdea] Disable flushing
1 file changed, 1 insertion(+)
create mode 100644 com.docker.driver.amd64-linux/disk/full-sync-on-flush
Seems to be working great now. Thanks!
We had an issue where we were seeing a ~10x performance regression on specific write-heavy disk IO workloads. In particular, our backend automated testing setup (Python, Django, etc.) would reset (MySQL) database state in between individual test cases by truncating all tables in the test database. After installing a recent 1.12.1 release, we saw the delay of this operation spike up from < 5 seconds to ~ 1 minute.
After installing the latest 1.12.1 beta and applying the above suggested workaround, the delay is back down to where it was originally. Seems to be working great again.
Thank you all for your help.
Tried this workaround: it didn’t help :(
Can we somehow use NFS mounts in this docker «machine»?
2016-09-29 23:24 GMT+03:00 Ming Pan notifications@github.com:
We had an issue where we were seeing a ~10x performance regression on specific write-heavy disk IO workloads. In particular, our backend automated testing setup (Python, Django, etc.) would reset (MySQL) database state in between individual test cases by truncating all tables in the test database. After installing a recent 1.12.1 release, we saw the delay of this operation spike up from < 5 seconds to ~ 1 minute.
After installing the latest 1.12.1 beta and applying the above suggested workaround, the delay is back down to where it was originally. Seems to be working great again.
Thank you all for your help.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/docker/for-mac/issues/668#issuecomment-250580589, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbWKDB-u-IOdHwm6u1h6VZKH58nV3v-ks5qvB5ogaJpZM4KAZGm .
Yours truly, Pavel.
@argent-smith could you describe what you tried (was it the same repro as described on this issue) and upload diagnostics via "Whale menu" -> "Diagnose & Feedback" -> "Diagnose & Upload" and quote the ID?
I'm facing this problem while creating DB2 image on RHEL Server release 7.2 (Maipo) and docker version 1.12.1. If there is a fix for it, please share it with me.
Will downgrading to 1.12.0 work?
Fix by @djs55 worked for me on several devboxes.
Importing 3.3 GB sql file to mysql is now much faster with the custom workaround on docker beta.
But the webapp running in docker is still slower than on Linux boxes of my colleagues. Maybe I should try docker-machine and see the difference.
I'd like to also confirm that the workaround above. Before applying the workaround, I saw ~70 TPS on pgbench -i && pgbench -c 10 -T 5
vs ~2000 TPS when running postgres 9.6 natively. After applying the workaround, I'm seeing ~1500 TPS in docker now. More importantly, my application's test suite is no longer 10x slower in docker vs native.
Hopefully it will be possible to get similar fsync performance/behavior in docker as one gets natively in the future, which would be preferable to this workaround.
Workaround works for me too. However, I still don't find it clear why the original behavior (full-sync-on-flush = true
) is persisted. What are the use-cases in which we want to have it enabled? @djs55
@bobbypriambodo I'm guessing its a mater of correctness. The use case is that if your container terminates abnormaly, you normally don't want to lose data, so fsync
should be honored by default.
@bobbypriambodo : Your whole docker setup will likely get corrupted if you do a hard reboot of your machine with flush disabled for example. It has happened to me :) You definitely don't want that by default.
I updated to Version 1.12.2-beta28 (12906) Channel: Beta 71c4a001c2
The above workaround seems to have persisted after the update:
$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ cat com.docker.driver.amd64-linux/disk/full-sync-on-flush
false
git commit
- did the update actually "perform completely"?I have to change all of our Node containers gulp.watch interval from default to 1000ms. CPU usage down from 300% to 30%.
On the other hand, same containers run fine on Ubuntu with default gulp.watch interval.
I too have a similar issue with slow response when using postgres containers.
I'm having the same issues, how can we get the beta channel?
@mikeball: see https://docs.docker.com/docker-for-mac/
Ok, latest beta and fixes from @djs55 made no difference in my case.
The scenario is building an uberjar with the .m2 directory as a shared volume, and placing the generated jar into a different shared volume, meaning it does read and write. Time taken normally ~2 seconds, but docker for mac is well over 8 minutes with 100% cpu usage. In my case its 167 times slower than normal.
FWIW this issue still requires the workaround from above as of Version 1.12.3-beta29.3 (13640).
I also had to reapply the workaround, presumably because beta 29 encourages doing a hard reset which I did.
I am also on Version 1.12.3-beta29.3 and I think it did help a bit but overall the performance is order of magnitudes slower then a native docker-machine + virtualbox workflow. For instance a drupal site with many modules and high cpu I/O takes about 1min 30 seconds to install on docker-machine + virtualbox. With the fix from this issue the install time took about 8 minutes to install. Without the full-sync-on-flush commit after around 15 minutes I gave up waiting. Happy we are making progress on this issue :)
With the flush workaround and beta30, things are looking considerably faster in our mysql-5.6 based tests.
We applied the workaround, which brings our application build time down from 20 minutes to 3 minutes. However developers are reporting their entire docker installation gets corrupt if their host machine looses power.
Is the docker team planning to address the root issue?
I'm having the same issues. How can I download the docker for mac 1.12.0? If you have, can you send this to my email. Thank you. warerwang@gmail.com
It seems like whenever I upgrade Docker for Mac to a new Beta, the fsync patch gets blown away.
I wrote a little script to run whenever I upgrade and need to re-apply:
https://gist.github.com/kortina/67ad6e40e40d5199c3507cdad0c9a12c
Switched to stable and the problem persists. :sob:
Version 1.12.3 (13776)
Channel: Stable
583d1b8ffe
Thanks a lot @kortina. Really easy to apply your script like this:
URL="https://gist.githubusercontent.com/kortina/67ad6e40e40d5199c3507cdad0c9a12c/raw/docker-for-mac-fsync-perf-patch.sh"
curl -o- $URL | bash
I don't have to run the script anymore after upgrading docker. Are you experiencing this?
We're still seeing slow performance with mysql on 1.13.0. The above fix doesn't seem to have an effect.
@krknopp - are You sure, that You're affected exactly by this issue? Maybe You mount volume to host so You're affected by #77 as well?
@barat I missed that one. I think you're right. I'll check that one out now. Thank you.
Expected behavior
Performance equal or better than 1.12.0
Actual behavior
Docker 1.12.1 is much slower on pretty much all DB2 activity. Not sure if it is DB2 only, but I'm only experiencing it on DB2. For example empty database creation takes ~24 seconds on 1.12.0 but on 1.12.1 it takes 10 times more, around ~253 seconds
docker stats seems to indicate lower cpu usage and lower I/O activity on 1.12.1
Information
Steps to reproduce the behavior
docker pull ibmcom/db2express-c
docker --version
should outputDocker version 1.12.0, build 8eab29e
time docker build -t test .
Consider running docker stats while it is runningdocker rmi <imageid>
docker --version
should outputDocker version 1.12.1, build 6f9534c
time docker build -t test .
. Consider running docker stats while it is running