Closed devoprock closed 2 years ago
1)I am able to see yhat & yhat_lower & yhat_upper prediction values on prometheus but some how i am not seeing correct prediction values for yhat & yhat_lower & yhat_upper on prometheus/grafanan compare to original "container_cpu:container_cpu_usage_seconds_total:rate:sum" metrics . I have attached grafana dashboaard screen shot here. Green bar is :container_cpu:container_cpu_usage_seconds_total:rate:sum Sky blue bar is : yhat Red Bar is : yhat_upper Orange Bar is :yhat_lower Bottom Bar is : Anomaly detector. Can you please help what i am doing wrong on above ,Can you please review and let me know how can i fix ?
Hey I don't really get what isn't working. To me it looks like everything is working as intended.
- How can i see furture forecast data on prometheus/grafana dashboard ?
Currently we don't have any way to do this.
3)How can use daily & weekly & Holiday data setting if is required & what is default prediction is doing on app.py ?
You can configure the model in the model.py file.
@4n4nd ,
I dont see any issue with setup i see the yhat & yhat_lower & yhat_upper prediction values on prometheus but some how i see lot of difference between original values and yhat & yhat_lower & yhat_upper .
See attached screen shot in above: Green bar is : original container_cpu:container_cpu_usage_seconds_total:rate:sum Sky blue bar is : yhat Red Bar is : yhat_upper Orange Bar is :yhat_lower Bottom Bar is : Anomaly detector.
That seems to be doing well
Or you can try giving the model more data,
export FLT_ROLLING_TRAINING_WINDOW_SIZE=30d
and I don't think you need to retrain your model every minute,
export FLT_RETRAINING_INTERVAL_MINUTES=15
This will look at the past 30 days of data and retrain the model every 15 minutes.
Hope this is helpful, otherwise you can try tweaking the model.py which is from https://facebook.github.io/prophet/
Sure @4n4nd I will try with below values ..Just i want let you know we have only last 4 days data in prometheus and see below screen shot i see lot of differences between original values & yhat's.
export FLT_ROLLING_TRAINING_WINDOW_SIZE=30d export FLT_RETRAINING_INTERVAL_MINUTES=15
Green bar is : original container_cpu:container_cpu_usage_seconds_total:rate:sum Sky blue bar is : yhat Red Bar is : yhat_upper Orange Bar is :yhat_lower Bottom Bar is : Anomaly detector.
let the anomaly detector collect a few days of data and then see if the predictions improve
Just i want let you know we have only last 4 days
It should keep accumulating the data until it has 30 days of data.
@nandhyala By default, the model being trained is the Prophet model. You can also try training the Fourier model by importing the model_fourier.py
in your app.py
and compare between the two. As @4n4nd mentioned, due to the small amount of training data you have, it might be affecting the model's performance.
@4n4nd @hemajv , Really appreciate your help ! Now i am running app.py script from last 4 days and in progress and dont see much improvements yhat & yhat_upper & yhat_lower values with original values.
Can you please confirm we cant predict or training data with 1 week of data using prophet ? & how can importing the model_fourier.py in app.py ?
@nandhyala I can see that few anomalies were detected. Since the anomaly
is a 0 or 1 value, try increasing the scale for the 'anomaly' line (i.e. the orange line) on the graph so that you can visualize it better. You can predict on training data of 1 week, but the performance of the model may be better with more training data.
For training the Fourier model, change https://github.com/AICoE/prometheus-anomaly-detector/blob/master/app.py#L13 to: import model_fourier as model
Thanks @hemajv ,I will check with 2 weeks data and come back if i see same issues.
Parallel i will check model_fourier as per above link and let you know.
@hemajv ,
I have tried with model_fourier with 2 days old data but i dont see much improvement with fourier model . Below original & fouier metrics graph in grafana.
@nandhyala So this is one of the drawbacks with Fourier, it is more of a statistical extrapolation of the values vs the Prophet model which takes into account seasonality or trend in your data. One model may perform better than the other depending on the nature of the time series metric. I would still recommend collecting at least >2 weeks of data and re-training the Prophet/Fourier models.
Thanks @hemajv ,I will check same Prophet/Fourier models. once i get 2 weeks data.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
/close
@sesheta: Closing this issue.
@4n4nd ,
I am plaining to use yhat value todo Kuberentes pod autoscaling (at peak time it will scale up 1000 + pods) using Prometheus original metrics (container_cpu:container_cpu_usage_seconds_total:rate) as prediction yhat values in HPA.
I have passed below variables on prophet application node and started app.py.
export FLT_PROM_URL=http://xxxxxxx.amazonaws.com export FLT_RETRAINING_INTERVAL_MINUTES=1 export FLT_ROLLING_TRAINING_WINDOW_SIZE=3d export FLT_METRICS_LIST="container_cpu:container_cpu_usage_seconds_total:rate:sum"
container_cpu:container_cpu_usage_seconds_total:rate:sum = sum (rate (container_cpu_usage_seconds_total{container_name="smart-savant-cpu"}[5m]))
Below issue i have noticed on above steps:
1)I am able to see yhat & yhat_lower & yhat_upper prediction values on prometheus but some how i am not seeing correct prediction values for yhat & yhat_lower & yhat_upper on prometheus/grafanan compare to original "container_cpu:container_cpu_usage_seconds_total:rate:sum" metrics .
I have attached grafana dashboaard screen shot here. Green bar is :container_cpu:container_cpu_usage_seconds_total:rate:sum Sky blue bar is : yhat Red Bar is : yhat_upper Orange Bar is :yhat_lower Bottom Bar is : Anomaly detector.
Can you please help what i am doing wrong on above ,Can you please review and let me know how can i fix ?
2) How can i see furture forecast data on prometheus/grafana dashboard ?
3)How can use daily & weekly & Holiday data setting if is required & what is default prediction is doing on app.py ?
Can you please help me on above things to setup Kuberntess HPAA autoscaling on infrastructure ?
Thanks 🙏