ceph / ceph-medic

find common issues in ceph clusters
MIT License
22 stars 18 forks source link

Connect to openshift pods concurrently #115

Open djgalloway opened 5 years ago

djgalloway commented 5 years ago

oc exec is slow. Like half a second to return whoami slow.

This means a ceph-medic check of a cluster running in OpenShift can take at least 20 minutes.

If ceph-medic connected to all pods at the same time, it would save a lot of time.

alfredodeza commented 5 years ago

I think this is a two-pronged issue, one is that the current way of connecting is once per any remote action (system command or remote function), and there is benefit in using something like https://github.com/kubernetes-client/python/blob/master/examples/exec.py

The other one is to try and make concurrent connections. Priority should be on the long-lasting connection, rather than the multiple pods, because it is probably easier (code changes will have to go into remoto).

The later is going to be very complicated.

zmc commented 5 years ago

I've started working on this - or at least laying the groundwork. I'm starting by adapting remoto's KubernetesConnection to use the kubernetes-client python API instead of the oc command.

Because execnet is so baked in to remoto, it's a little complex - but initial testing is showing a pretty sizeable speedup: running whoami 50x goes from ~53s down to ~17s, without any concurrency.

Edit: Calls that use stdin often hang because of https://github.com/kubernetes-client/python-base/issues/106