dotmesh-io / dotmesh

dotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes
https://dotmesh.com
Apache License 2.0
539 stars 29 forks source link

[1d] push throughput poor, improve it by 5-10x #705

Open lukemarsden opened 5 years ago

lukemarsden commented 5 years ago

in testing, a dataset which took 4 minutes to wget from GCS to a GCP instance at ~200MB/sec, then took 40 minutes to push from GCP runner to GCP hub (staging).

to be fair there was some unzipping involved, which probably 2x'd the dataset size, but it should still be possible to push between dotmesh at disk/network speed.

try adding mbuffer between zfs send/recv to smooth out the stream and increase the throughput.

lukemarsden commented 5 years ago

Update: same test took 13 minutes on prod, perhaps because the prod VM has a better spec. Still, I suspect there's room for improvement by adding mbuffer. See https://everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/