Open zeigerpuppy opened 6 years ago
p.s. docker has been all sorts of fun to get going on this setup. I had to downgrade from v18.03 to 17.12.0 as containers were not stopping properly. Please ignore the error in the docker info output ( Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
). It arises because we're using a ZFS dataset within a pool rather than a dedicated pool. I don't think this has implications for the docker ZFS implementation apart from failing on zpool info commands.
Also, we chose to use Debian Stretch without systemd
on the server so some commands that call systemd specifically may fail. The install went well with a small tweak for the cc-proxy deb installer and I don't think this has any implications for IO
Hi @zeigerpuppy Good question. 9p msize
has been discussed before, as has the cache mode a bit.
Have a look at https://github.com/clearcontainers/hyperstart/pull/25 for the discussion around a PR to set msize. I think that got stuck as nobody had time to run exhaustive tests across different block size transfers etc. to get data on if it improved all situations, and what the memory footprint overhead might be etc.
And then I raised a very related item early this week for kata-containers: https://github.com/kata-containers/runtime/issues/201
Right now we don't have a way in either Clear or Kata containers to adjust/tweak/add those settings to the mounts without having to rebuild either the agent (Clear) or runtime (Kata). Yes, it would be good to have at least maybe a developer mode option in the toml config file to allow such things to be tweaked.
Both/either of those Issues will show you where and how you could add the extra options if you wanted to do a build and experiment.
Also, iirc, enabling cacheing on 9p is something that needs careful consideration. iirc, the original design of 9p basically said 'do not cache' - but, I think we have experimented with this before, and as long as the constrained situation is understood, I think we could enable some form of cacheing. @chao-p @rarindam for more thoughts and input. @bergwolf and @gnawux for visibility, relevance to kata, and any input etc.
@grahamwhaley Yeah we need to have a config option in our toml file for 9p msize, that way it will atleast be convenient to try out different msizes before we settle on a optimal default one without having to rebuild the runtime. I'll raise a PR for that.
Good to hear that it'll get some consideration, options in the toml file would be great. Please let me know if I can help with testing. Also, for the meantime, I was wondering if there's any way to manually tweak these options in a built clear container?
@zeigerpuppy take a look at @amshinde PR here: https://github.com/kata-containers/runtime/pull/207
@zeigerpuppy https://github.com/kata-containers/runtime/pull/207 is now merged. You can now try kata-runtime with the ability to configure 9p msizes for a container. It will be great if you could help out with the testing. @grahamwhaley Can you provide details about various parameters that we need to take into consideration for testing this out.
@amshinde, thanks for the details. I am a little behind in bug chasing so may be a little while until I can do a build. In the meantime I found an interesting way to restore performance....
Previously I was using cc-runtime with the following file stack (all on Debian Stretch without systemd):
ZFS
-> docker -> cc-runtime using file mappingUnfortunatley, docker's implementation of ZFS is pretty basic and seems like they've just adapted the overlay driver. This is a real shame as ZFS is a natural fit when zvols are used. The main problem I found with this stack was poor performance, but also MongoDB containers failed to work at all, I presume because it couldn't properly memory map to the filesystem.
performance, as stated above was only about 130MB/s
ZFS
-> sparse ZVOL
-> thin provisioned LVM
-> docker devicemapper
-> cc-runtime with virtio-blk
driverThis setup looks much better, there is now proper block usage and it's sparsely provisioned throughout. I can snapshot directly on the ZVOL or at the LVM level. MongoDB works again and I/O performance is more like 1.3GB/s.
Now the strange bit, I mapped an external volume with the docker config:
docker run -it --mount type=bind,source=/zpool1/vmdata/test,target=/test --name iozone threadx/iozone
Now, I presume this is still using a 9p mapping but performance is great (approx 1GB/s read/write).
So, for the moment, I plan to stick with this config. However, I will try to give the kata runtime a go once I've migrated a whole lot of VMs....
ps. if you're using LVM in Debian Stretch, watch out for this bug which prevents re-attaching of LVM volumes at boot by default.
Description of problem
tuning the p9fs block size and cache modes allows significant performance boost (10x) in KVM. Is there a process for setting these in cc-runtime?
Example
In KVM, I have found two options that significantly increase IO (see below).
This is on a server with a
raidz2
ZFS array with 8x Micron 9100 1.2TB NVMe SSDs. There's plenty of head room, Raw IO on this array is around 3GB/sec/process plateauing at about 30GB/sec for 20 processes in iozone3.With KVM guests using plan9 file system, it looks possible to get about 1GB/sec per CPU but we're getting only about 130MB/sec with clear containers and bind-mounted storage.
Host:
as the filesystem (ZFS) is consistent on the host by design, it's safe to use the
passthough
modeClient
in the mount options, adjusting the
msize
(packet payload in bytes) and disabling the client cache has a huge effect on I/OActual result
With my standard KVM clients on the same host, I get about 1GB/sec/process (measured with iozone3). With the cc-runtime backed docker storage (bind mounts) I get only 130MB/sec.
Any help in setting these options in the cc-runtime options would be great as they are critical to good performance.
Settings output
Runtime config files
Runtime default config files
Runtime config file contents
Output of "
cat "/etc/clear-containers/configuration.toml"
":Output of "
cat "/usr/share/defaults/clear-containers/configuration.toml"
":Agent
version:
Logfiles
Runtime logs
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found No recent runtime problems found in system journal.
Proxy logs
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found No recent proxy problems found in system journal.
Shim logs
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found No recent shim problems found in system journal.
Container manager details
Have
docker
Docker
Output of "
docker version
":Output of "
docker info
":Output of "
systemctl show docker
":No
kubectl
Packages
Have
dpkg
Output of "dpkg -l|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"
":Have
rpm
Output of "rpm -qa|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"
":