gardener / etcd-backup-restore

Collection of components to backup and restore the etcd of a Kubernetes cluster.
Apache License 2.0
289 stars 100 forks source link

Memory optimizations for larger value sizes of etcd in backup-sidecar. #152

Open PadmaB opened 5 years ago

PadmaB commented 5 years ago

For large key-value sizes of etcd and watch on etcd yielding responses at a high rates, we see that the backup sidecar memory shoots up drastically but linearly.

profile015

georgekuruvillak commented 5 years ago

The memory consumption surge is associated with the watch implementation in the client library. As can be seen here, when the ingress rate of the events are more than the egress rate from the watch response channel the buffers holding the events tend to bloat. There is no throttling done between etcd and the client. Even if throttling was done, we would not be able to depend on it. As the revision of etcd would have moved ahead and due to throttling the watch events would be far behind. We would have to implement a different approach to watches if we stumble upon this issue in realtime.

@amshuman-kr wdyt?

amshuman-kr commented 5 years ago

Does that mean kube-apiserver might also suffer memory spike in case of large updates?

Potentially, yes. They would be using the same client library.

amshuman-kr commented 5 years ago

I can think of only three options.

  1. Document this and do nothing for now.
  2. Contribute throttling to etcd client.
  3. Try a polling-based approach (with parallel download of chunks of revisions) as against the watch-based approach now.

I vote for option 1.

PadmaB commented 5 years ago

@georgekuruvillak Thanks for the investigation ! I agree with @amshuman-kr . Since this issue has, so far, not appeared in any of the active landscapes but only in the performance tests, due to the load - we could put this on-hold for now and pick it up in case we face the issue on any of the active landscapes or when we are done with other high-prio tasks.

swapnilgm commented 5 years ago

Sorry merging proposal closed issue automatically. Reopening, since the issue still persists. And need to be resolved in future.