cortexproject / cortex-helm-chart

Helm chart for Cortex
Apache License 2.0
147 stars 163 forks source link

Upgrade Image from v1.13.0 to v1.14.1 Using chart v2.1.0 fails error loading config from /etc/cortex/cortex.yaml: Error parsing config file: yaml: unmarshal errors #454

Closed davidg-datascene closed 1 year ago

davidg-datascene commented 1 year ago

Upgrading Cortex helm chart from v1.7.0 to v2.1.0 that includes Cortex docker image v1.14.1 fails to upgrade. Using chart 2.1.0 with cortex image: v1.13.0/1/2 works as expected.

error loading config from /etc/cortex/cortex.yaml: Error parsing config file: yaml: unmarshal errors:                                                                                  
line 4: field storage not found in type alertmanager.MultitenantAlertmanagerConfig
line 72: field storage not found in type ruler.Config
line 88: field index_queries_cache_config not found in type storage.Config

To Reproduce

helm upgrade -n cortex cortex -f my-cortex-values.yaml cortex-helm/cortex --version v2.1.0 

my-cortex-values.yaml

USER-SUPPLIED VALUES:
alertmanager:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
compactor:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
config:
  alertmanager:
    storage:
      azure:
        account_key: xxxxxxxxx
        account_name: xxxxxx
        container_name: cortexalerts
      type: azure
  auth_enabled: true
  blocks_storage:
    azure:
      account_key: xxxxxxx
      account_name: xxxxx
      container_name: cortexmetrics
    backend: azure
  ruler:
    storage:
      azure:
        account_key: xxxxx
        account_name: xxxxx
        container_name: cortexrules
      type: azure
  storage:
    engine: blocks
distributor:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
ingester:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
nginx:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
querier:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
 operator: Equal
    value: "true"
query_frontend:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
ruler:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"
store_gateway:
  nodeSelector:
    agentpool: utils
  persistentVolume:
    size: 10Gi
    storageClass: azuredisk-standard-lrs
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/utils
    operator: Equal
    value: "true"

/etc/cortex/cortex.yaml

alertmanager:
  enable_api: false
  external_url: /api/prom/alertmanager
  storage:
    azure:
      account_key: xxxx
      container_name: cortexalerts
    type: azure
api:
  prometheus_http_prefix: /prometheus
  response_compression_enabled: true
auth_enabled: true
blocks_storage:
  azure:
    account_key: xxxx
    account_name: xxxx
    container_name: cortexmetrics
  backend: azure
  bucket_store:
    bucket_index:
      enabled: true
    sync_dir: /data/tsdb-sync
  tsdb:
    dir: /data/tsdb
distributor:
  pool:
    health_check_ingesters: true
  shard_by_all_labels: true
frontend:
  log_queries_longer_than: 10s
ingester:
  lifecycler:
    final_sleep: 30s
    join_after: 10s
    num_tokens: 512
    observe_period: 10s
    ring:
      kvstore:
        store: memberlist
      replication_factor: 3
ingester_client:
  grpc_client_config:
    max_recv_msg_size: 10485760
    max_send_msg_size: 10485760
limits:
  enforce_metric_name: true
  max_query_lookback: 0s
  reject_old_samples: true
  reject_old_samples_max_age: 168h
memberlist:
  bind_port: 7946
  join_members:
  - 'cortex-memberlist'
querier:
  active_query_tracker_dir: /data/active-query-tracker
  store_gateway_addresses: |-
    dns+cortex-store-gateway-headless:9095
query_range:
  align_queries_with_step: true
  cache_results: true
  results_cache:
    cache:
      memcached:
        expiration: 1h
      memcached_client:
        timeout: 1s
  split_queries_by_interval: 24h
ruler:
  enable_alertmanager_discovery: false
  enable_api: true
  storage:
    azure:
      account_key: xxxxxx
      account_name: xxxxx
      container_name: cortexrules
    type: azure
runtime_config:
  file: /etc/cortex-runtime-config/runtime_config.yaml
server:
  grpc_listen_port: 9095
  grpc_server_max_concurrent_streams: 10000
  grpc_server_max_recv_msg_size: 10485760
  grpc_server_max_send_msg_size: 10485760
  http_listen_port: 8080
storage:
  engine: blocks
  index_queries_cache_config:
    memcached:
      expiration: 1h
    memcached_client:
      timeout: 1s
store_gateway:
  sharding_enabled: false/

Expected behavior Upgrades to v1.14.1 of Cortex image

Environment: Azure AKS version v1.23.15 Cortex helm chart v1.7.0 --> v2.1.0 Docker image v1.13.0 --> v1.14.1

nschad commented 1 year ago

Upgrading Cortex helm chart from v1.7.0 to v2.1.0 that includes Cortex docker image v1.14.1 fails to upgrade. Using chart 2.1.0 with cortex image: v1.13.0/1/2 works as expected.

error loading config from /etc/cortex/cortex.yaml: Error parsing config file: yaml: unmarshal errors:                                                                                  
line 4: field storage not found in type alertmanager.MultitenantAlertmanagerConfig
line 72: field storage not found in type ruler.Config
line 88: field index_queries_cache_config not found in type storage.Config

Yeah your config is wrong. The error is coming from cortex not the helm-chart. Make sure your config matches https://cortexmetrics.io/docs/configuration/configuration-file/

davidg-datascene commented 1 year ago

But shouldn't the helm chart be providing the correct values following the change from Cortex 1.13 to 1.14 and the helm chart builds out the /etc/cortex/cortex.yaml file. The upgrade guide doesn't show steps to go from 1.7 (Cortex 1.13.x) of the chart to 2.1.0 (Cortex 1.14.x) and changes required.

https://github.com/cortexproject/cortex-helm-chart/blob/v2.1.0/templates/secret.yaml#L10

davidg-datascene commented 1 year ago

I'll close this as no changes required yourside. Thanks for taking a look.

nschad commented 1 year ago

But shouldn't the helm chart be providing the correct values following the change from Cortex 1.13 to 1.14 and the helm chart builds out the /etc/cortex/cortex.yaml file. The upgrade guide doesn't show steps to go from 1.7 (Cortex 1.13.x) of the chart to 2.1.0 (Cortex 1.14.x) and changes required.

https://github.com/cortexproject/cortex-helm-chart/blob/v2.1.0/templates/secret.yaml#L10

Yes it does however you need to use --reset-values otherwise old values persist between upgrades.