Extreme resource draining issue after deploying kubeflow

amr-elsehemy commented 4 years ago

I'm following the readme to deploy kubeflow juju bundle on top of microk8s, I'm using an ubuntu 18.04 virtual machine via virtual box allocating 16GB ram and 4 cores, All works fine, and finally I'm able to access kubeflow and deploy pipelines. But the machine gets slower and slower by time, till it starts to hang and almost freeze, after an hour of usage . looking for the resource allocation on ubuntu, I find the 4 core processors allocated all hitting 95% to 100% utilization.

Restarting the vm also doesn't help because once it starts the resource allocation reaches the above limits even without running any experiment .

Is there any explanation for this ?

tvansteenburgh commented 4 years ago

Hi @amr-elsehemy, it would help to know which processes are using the resources. Could you get the output of these commands please?

ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head

amr-elsehemy commented 4 years ago

Thanks a lot @tvansteenburgh , here are the outputs

cpu - in idle mode, no work is done on the vm and no ml experiments take place :

7316  6993 /usr/lib/gvfs/gvfs-udisks2-  0.0 53.1
 3895     1 /snap/microk8s/1378/kube-ap  3.5 21.8
 3912     1 /snap/microk8s/1378/kubelet  0.8 13.1
    1     0 /sbin/init splash            0.0 12.2
 6993     1 /lib/systemd/systemd --user  0.0 12.2
 3572     1 /snap/microk8s/1378/bin/con  1.2 11.7
 2170     1 /lib/systemd/systemd --user  0.0 10.8
 3403  7061 python3 /var/lib/juju/agent  0.0  8.0
13533 13153 /var/lib/juju/tools/jujud m  1.3  8.0

cpu while a simple pipeline is running

 PID  PPID CMD                         %MEM %CPU
 7316  6993 /usr/lib/gvfs/gvfs-udisks2-  0.0 68.8
18049 20514 python3 /var/lib/juju/agent  0.2 51.0
 5293  7241 gnome-system-monitor         0.2 12.5
 3912     1 /snap/microk8s/1378/kubelet  0.8  9.2
 3572     1 /snap/microk8s/1378/bin/con  1.2  7.5
 3895     1 /snap/microk8s/1378/kube-ap  3.5  7.4
    1     0 /sbin/init splash            0.0  4.6
 6993     1 /lib/systemd/systemd --user  0.0  4.2
 2170     1 /lib/systemd/systemd --user  0.0  4.0

===========================================

memory :

PID  PPID CMD                         %MEM %CPU
 3895     1 /snap/microk8s/1378/kube-ap  3.5 21.7
 7241  7030 /usr/bin/gnome-shell         3.3  4.9
 3255     1 /snap/microk8s/1378/etcd --  1.3  7.7
13533 13153 /var/lib/juju/tools/jujud m  1.3  8.0
 2268  2244 /usr/bin/gnome-shell         1.2  5.8
 3572     1 /snap/microk8s/1378/bin/con  1.2 11.6
 7013  7011 /usr/lib/xorg/Xorg vt2 -dis  0.9  0.6
19261  7030 /usr/bin/gnome-software --g  0.9  1.8
19819 19518 java -jar modeldb-1.0-SNAPS  0.8  5.0

This is a screenshot from the sys monitor at one moment, this is not always the case, sometimes it goes down a bit, but usually the utilization is close to the one in the screen shot

knkski commented 4 years ago

@amr-elsehemy, can you try running this command to see if it reduces CPU usage?

systemctl stop --user gvfs-udisks2-volume-monitor

If that works, see here for more information about what's going on:

https://github.com/ubuntu/microk8s/issues/500

amr-elsehemy commented 4 years ago

@amr-elsehemy, can you try running this command to see if it reduces CPU usage?
systemctl stop --user gvfs-udisks2-volume-monitor
If that works, see here for more information about what's going on:

ubuntu/microk8s#500

Thank you @knkski for your support, that command does help a lot, it kills the draining process and all gets back to normal, however, it reopens again automatically after a little while, and i have to run the command every now and then, Is there a more permanent disabling option ?

knkski commented 4 years ago

@amr-elsehemy: Yeah, you can run systemctl mask --user gvfs-udisks2-volume-monitor. Note that this will have some weird side effects if you're running it on your regular computer vs inside a VM, such as not having USB drives automatically mounted and shown in the file manager.

amr-elsehemy commented 4 years ago

Thanks @knkski , that worked .

canonical / bundle-kubeflow

Extreme resource draining issue after deploying kubeflow #198