MetaCell / cloud-harness

Other
14 stars 5 forks source link

NFS auto restart when provisioning fails #684

Open filippomc opened 1 year ago

filippomc commented 1 year ago

Sometimes the NFS gets stuck and it's not able to provision more volumes and restarting the pod fixes the issue. The reason is not clear and would need further investigation, but it would be nice to automatically restart the pod when this happens. Either a liveness probe or just crashing are viable solutions

zsinnema commented 1 year ago

@filippomc in case this happens again please inform me before restarting the nfs server. I would like to see in what state it is also can you indicate how to reproduce this? I am not able to reproduce it

filippomc commented 1 year ago

The only way I know to reproduce is adding a lot of workspaces on https://v2.opensourcebrain.org/