AICoE / prometheus-anomaly-detector

A newer more updated version of the prometheus anomaly detector (https://github.com/AICoE/prometheus-anomaly-detector-legacy)
GNU General Public License v3.0
597 stars 151 forks source link

Issue using docker container #139

Closed ndragon798 closed 2 years ago

ndragon798 commented 4 years ago

Just tried spinning this up using the docker image provided and it crashes right away when turning it on.

_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-oj82vzw6 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2020-11-16 21:44:38,433:INFO:configuration: Metric data rolling training window size: 14 days, 23:59:59.974354
2020-11-16 21:44:38,433:INFO:configuration: Model retraining interval: 15 minutes
2020-11-16 21:44:38,526:ERROR:fbprophet.plot: Importing plotly failed. Interactive plots will not work.
2020-11-16 21:44:38,528:DEBUG:urllib3.connectionpool: Starting new HTTP connection (1): demo.robustperception.io:9090
2020-11-16 21:44:39,016:DEBUG:urllib3.connectionpool: http://demo.robustperception.io:9090 "GET /api/v1/query?query=%27up%27 HTTP/1.1" 200 104
Traceback (most recent call last):
  File "app.py", line 37, in <module>
    model.MetricPredictor(
  File "/model.py", line 23, in __init__
    self.metric = Metric(metric, rolling_data_window_size)
  File "/opt/conda/envs/prophet-env/lib/python3.8/site-packages/prometheus_api_client/metric.py", line 63, in __init__
    self.metric_name = metric["metric"]["__name__"]
TypeError: 'float' object is not subscriptable

My compose file

version: '3'

services:

  anomaly-detector:
    image: quay.io/aicoe/prometheus-anomaly-detector:latest
    ports: 
      - 9095:8080
    environment:
    - FLT_PROM_URL=http://demo.robustperception.io:9090
    - FLT_RETRAINING_INTERVAL_MINUTES=15
    - FLT_METRICS_LIST='up'
    - APP_FILE=app.py
    - FLT_DATA_START_TIME=3d
    - FLT_DEBUG_MODE=True
    - FLT_ROLLING_TRAINING_WINDOW_SIZE=15d

I've tried this on two different hosts a windows and a linux host. Any suggestions?

ndragon798 commented 3 years ago

I saw this got looked at if you need anymore information let me know.

4n4nd commented 3 years ago

thanks! I just tested this and was able to reproduce the issue.

4n4nd commented 3 years ago

@ndragon798 I don't know what's going on here. I tried running the same container image using docker and podman (not docker-compose)

docker run --env FLT_PROM_URL=http://demo.robustperception.io:9090 \
           --env FLT_METRICS_LIST='up' \
           --env FLT_RETRAINING_INTERVAL_MINUTES=15 \
           --env APP_FILE=app.py \
           --env=FLT_ROLLING_TRAINING_WINDOW_SIZE=15d \
           --env FLT_DEBUG_MODE=True \
           -p 8080:8080 \
           quay.io/aicoe/prometheus-anomaly-detector:latest

and it worked fine but for some reason it does not work with docker-compose :confused:

aroundthecode commented 3 years ago

Same issue stating within kubernetes:

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-jabk9l0g because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2020-12-07 16:25:30,123:INFO:configuration: Metric data rolling training window size: 14 days, 23:59:59.925515
2020-12-07 16:25:30,123:INFO:configuration: Model retraining interval: 15 minutes
2020-12-07 16:25:30,514:ERROR:fbprophet.plot: Importing plotly failed. Interactive plots will not work.
Traceback (most recent call last):
  File "app.py", line 37, in <module>
    model.MetricPredictor(
  File "/model.py", line 23, in __init__
    self.metric = Metric(metric, rolling_data_window_size)
  File "/opt/conda/envs/prophet-env/lib/python3.8/site-packages/prometheus_api_client/metric.py", line 63, in __init__
    self.metric_name = metric["metric"]["__name__"]
TypeError: 'float' object is not subscriptable
Image: quay.io/aicoe/prometheus-anomaly-detector:latest

Environment variables:
 FLT_PROM_URL: http://prometheus-server.kube-system:80
 FLT_RETRAINING_INTERVAL_MINUTES: 15
 FLT_METRICS_LIST: 'up,istio_request_duration_seconds_sum,istio_request_duration_seconds_count'
 APP_FILE: app.py
 FLT_DATA_START_TIME: 3d
 FLT_ROLLING_TRAINING_WINDOW_SIZE: 15d
aroundthecode commented 3 years ago

Solved on my side, I was providing several metrics at once with wrong separator

 FLT_METRICS_LIST: 'up;istio_request_duration_seconds_sum;istio_request_duration_seconds_count'

( "," instead of ";")

Once fixed the pod started properly

kiefersmith commented 3 years ago

I've also run into this.

sesheta commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sesheta commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta commented 2 years ago

@sesheta: Closing this issue.

In response to [this](https://github.com/AICoE/prometheus-anomaly-detector/issues/139#issuecomment-994021315): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.