Hyperpilotio / be-controller-ui

Internal UI for be controller
Apache License 2.0
1 stars 0 forks source link

Unstable behavior. More testing needed. #3

Open kozyraki opened 7 years ago

kozyraki commented 7 years ago

In general, the UI is quite unstable. Every now and then, some stats disappear forever. For example, after a while the CPU utilization statistics disappear. If I try to restart the ui (kubectl delete -> create) then I dont' get the app-level qos statistics. Not sure what the problem is but it would be nice to do some more testing of stability. For example, test bringing down and restarting the UI to see if it consistently works.

adrianliaw commented 7 years ago

@kozyraki Thanks for pointing out these, I've been working on fixing this issue throughout last week, and it seems like it's working for me now. Please deploy the latest adrianliaw/be-controller-ui image and see if works consistently, and please let me know if there's any issues coming up with the UI.

I've tested it by running the cluster for 2 hours and showing the statistics in the browser with the UI. I've also tested bringing the be-controller-ui deployment down and recreating it. Right now all the charts are showing as expected and never get disappeared (as long as there's data coming in).

However, while I'm running this for 2 hours, the CPU utilisation percentage graph eventually gets empty, and that's because of a failure in snap (I'm not sure why, but most of the pods inside hyperpilot namespace get crashed after running the cluster for a while, including locust, influx, load controllers, grafana and demo-ui), this can be fixed by restarting the snap pods.

Also, I'm not quite sure why, but sometimes the CPU utilisation rates for each container calculated within Influx came out to be -Infinity (only for spark pods according to what I saw). I'm currently just replacing those infinities with 0.