apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
246 stars 111 forks source link

Apache-solr:Solr-exporter #501

Open vipul-06 opened 1 year ago

vipul-06 commented 1 year ago

I have deployed solr cloud using helm in gke cluster, below are the steps for deploying which I used

(1) helm repo add apache-solr https://solr.apache.org/charts

(2) kubectl create -f https://solr.apache.org/operator/downloads/crds/v0.5.1/all-with-dependencies.yaml

(3) helm install -n dev-backend solr-operator apache-solr/solr-operator --version 0.5.1

(4)helm install -n dev-backend experro-solr apache-solr/solr -f values-solr.yaml --version=0.5.1 (below is the values-solr.yaml file)

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Default values for solr.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

nameOverride: ""
fullnameOverride: ""

# If you want to use autoScaling, do not set this field
replicas: 5

global:
  imagePullSecrets: []
  clusterDomain: ""

# Use a serviceAccount for all pods created under this chart (Solr and ZK)
serviceAccount:
  # Specifies whether a ServiceAccount should be created
  create: false
  # The name of the ServiceAccount to use.
  # Required if create is false.
  # If not set and create is true, a name is generated using the fullname template
  name: ""

image:
  repository: "solr"
  tag: "8.11"
  # Default pullPolicy is empty, which is Always for "latest" tags and IfNotPresent for all others.
  pullPolicy: ""
  imagePullSecret: ""

busyBoxImage: {}
  # repository: "busybox"
  # tag: "1.28.0-glibc"
  # pullPolicy: ""
  # imagePullSecret: ""

solrOptions:
  javaMemory: "-Xms24000m -Xmx24000m"
  javaOpts: ""
  logLevel: "DEBUG"
  gcTune: ""
  solrModules: []
  additionalLibs: []

  # Enable authentication for the Solr Cloud
  # More information can be found at:
  # https://apache.github.io/solr-operator/docs/solr-cloud/solr-cloud-crd.html#authentication-and-authorization
  security:
     authenticationType: Basic
     # basicAuthSecret: my-basic-auth-secret  #secret-name
     # probesRequireAuth: false
     # bootstrapSecurityJson:
     #    name: my-custom-security-json-secret
     #    key: security.json

# Specify how the SolrCloud should be addressable
# https://apache.github.io/solr-operator/docs/solr-cloud/solr-cloud-crd.html#addressability
addressability:
  podPort: 8983
  commonServicePort: null
  # kubeDomain is defaulted by global.clusterDomain if it's not provided
  kubeDomain: ""
  # Use external to provide endpoint(s) for your SolrCloud outside of Kubernetes
  external: {}
    # method: "Ingress"
    # domainName: "example.com"
    # additionalDomains: []
    # useExternalAddress: false
    # hideNodes: false
    # hideCommon: false
    # nodePortOverride: null
    # ingressTLSTerminationSecret: ""

# Specify how rolling updates should be managed for the Solr StatefulSet
# https://apache.github.io/solr-operator/docs/solr-cloud/solr-cloud-crd.html#update-strategy
updateStrategy:
  method: "Managed"
  # Options for the managed update method
  managed: {}
    # The number of Solr pods in a Solr Cloud that are allowed to be unavailable during the rolling restart.
    # More pods may become unavailable during the restart, however the Solr Operator will not kill pods if the limit has already been reached.
    # Either a static number, or a percentage representing the percentage of total pods requested for the statefulSet.
    # maxPodsUnavailable: "25%"

    # The number of replicas for each shard allowed to be unavailable during the restart.
    # Either a static number, or a percentage representing the percentage of the number of replicas for a shard.
    # Defaults to 1
    # maxShardReplicasUnavailable: 1
  # Cron schedule for automatically restarting the Solr Cloud
  # For available CRON syntaxes, check here: https://pkg.go.dev/github.com/robfig/cron/v3?utm_source=godoc#hdr-CRON_Expression_Format
  restartSchedule: ""

# More information can be found at:
# https://apache.github.io/solr-operator/docs/solr-cloud/solr-cloud-crd.html#data-storage
dataStorage:
  # Either persistent or ephemeral
  type: "persistent" #"ephemeral"

  # Specify a capacity for your data storage.
  # This effects both ephemeral and persistent storage.
  capacity: "20Gi"

  # Options for ephemeral storage. Only used if type = "ephemeral"
  ephemeral: {}
    # emptyDir: {}
    # hostPath: {}

  # Options for persistent storage. Only used if type = "persistent"
  persistent:
    reclaimPolicy: "Retain"
    pvc:
      name: ""
      labels: {}
      annotations: {}
      storageClassName: "gcp-ssd-volume"

  # BackupRestoreOptions is required when using this cloud with the SolrBackup CRD.
  # DEPRECATED: Please use backupRepositories instead.
  # TODO: Remove in v0.6.0
  backupRestoreOptions: {}
    # volume: {}
    # directory: ""

# A list of BackupRepositories to connect your SolrCloud to
# See either for more information:
# - https://apache.github.io/solr-operator/docs/solr-backup
# - kubectl explain solrcloud.spec.backupRepositories
backupRepositories: []
  # - name: example-repo # Required
  #   gcs:
  #     bucket: example-bucket # Required
  #     gcsCredentialSecret: # Required
  #       name: "gcsSecretName"
  #       key: "service-account-key.json"

zk:
  # A ZooKeeper Node to host all the information for this SolrCloud under
  chroot: ""
  # If true, this will add the "/<namespace>/<name>" to the end of the provided chroot, if any is provided.
  # This will let you deploy multiple Solr Clouds without having to manage the specific chroots yourself.
  uniqueChroot: false

  # Use an existing ZooKeeper cluster
  # Address available within the Kubernetes Cluster
  address: ""
  # Address available both within and outside the Kubernetes Cluster
  externalAddress: ""

  # If no "address" is provided, this defines the ZookeeperCluster created for this SolrCloud
  provided:
    replicas: 5
    image: {}
      # repository: "pravega/zookeeper"
      # tag: ""
      # pullPolicy: IfNotPresent
      # imagePullSecret: ""
    zookeeperPodPolicy: {}
      # affinity: {}
      # tolerations: []
    nodeSelector:
      pool: solr-cloud
      # env: []
      # resources: {}
      # # Set ZK service account individually instead of the global "serviceAccount.name"
      # serviceAccountName: ""

    # Storage defaults to the type of storage you use for Solr, which is ephemeral by default.
    # Explicitly set the storage type, only necessary when wishing to use an empty persistence or ephemeral object.
    storageType: ""
    persistence: {}
      # reclaimPolicy: "Retain"
      # spec: {}
    ephemeral: {}
      # emptydirvolumesource: {}

    # Zookeeper Config Options to set for the provided cluster
    config: {}

  # Use this section to inject ACL information for your zookeeper from a Kube secret in the same namespace as your SolrCloud
  acl: {}
    # secret: zk-acls
    # usernameKey: username
    # passwordKey: password

  # Use this section to inject ACL information for your zookeeper from a Kube secret in the same namespace as your SolrCloud
  readOnlyAcl: {}
    # secret: zk-acls
    # usernameKey: username
    # passwordKey: password

# Enable TLS between your SolrCloud nodes
# More information can be found at:
# https://apache.github.io/solr-operator/docs/solr-cloud/solr-cloud-crd.html#enable-tls-between-solr-pods
solrTLS: {}
  # pkcs12Secret:
  #   name: secret-name
  #   key: pkcs12-key
  # keyStorePasswordSecret:
  #   name: secret-name
  #   key: password-key
  # trustStoreSecret:
  #   name: secret-name
  #   key: truststore-key
  # trustStorePasswordSecret:
  #   name: secret-name
  #   key: password-key
  # clientAuth: None
  # verifyClientHostname: false
  # checkPeerName: false
  # restartOnTLSSecretUpdate: false
  # mountedTLSDir:
  #   path: /path/to/mounted/tls
  #   keystoreFile: "keystore.p12"
  #   keystorePasswordFile: ""
  #   truststoreFile: "truststore.p12"
  #   truststorePasswordFile: ""

solrClientTLS: {}
  # pkcs12Secret:
  #   name: secret-name
  #   key: pkcs12-key
  # keyStorePasswordSecret:
  #   name: secret-name
  #   key: password-key
  # trustStoreSecret:
  #   name: secret-name
  #   key: truststore-key
  # trustStorePasswordSecret:
  #   name: secret-name
  #   key: password-key
  # mountedTLSDir:
  #   path: /path/to/mounted/tls
  #   keystoreFile: "keystore.p12"
  #   keystorePasswordFile: ""
  #   truststoreFile: "truststore.p12"
  #   truststorePasswordFile: ""

# Customize the Solr Pod for your needs
podOptions:
  annotations: {}
  labels: {}

  # Add extra sidecar or init containers, e.g. for log or metrics forwarding
  sidecarContainers: []
  initContainers: []

  priorityClassName: ""
  envVars: []
  podSecurityContext: {}
  terminationGracePeriodSeconds: null

  # Set Solr service account individually instead of the global "serviceAccount.name"
  serviceAccountName: ""

  # Manage where the Solr pods are scheduled
  affinity: {}
  tolerations: []
  nodeSelector:
    pool: solr-cloud
  # Documentation available at https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
  # If a labelSelector is not provided, it will be auto-populated by the Solr Operator to match the statefulSet labels
  topologySpreadConstraints: []
    # - maxSkew: 1
    #   topologyKey: zone
    #   whenUnsatisfiable: DoNotSchedule

  # Probes for the Solr pods
  livenessProbe: {}
  readinessProbe: {}
  startupProbe: {}

  # Lifecycle for the Solr container
  lifecycle: {}

  imagePullSecrets: []

  resources:
    # limits:
    #   cpu: "2"
    #   memory: 10G
    requests:
      cpu: 6000m
      memory: 28G

  volumes: []
    # - name:
    #   defaultContainerMount: {}
    #   source: {}

statefulSetOptions:
  annotations: {}
  labels: {}

  # Specify a podManagementPolicy when you want control over how scale-ups and scale-downs occur
  # https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-management-policies
  # The default is: Parallel
  podManagementPolicy: ""

commonServiceOptions:
  annotations: {}
  labels: {}

headlessServiceOptions:
  annotations: {}
  labels: {}

nodeServiceOptions:
  annotations: {}
  labels: {}

ingressOptions:
  annotations: {}
  labels: {}
  ingressClassName: ""

configMapOptions:
  annotations: {}
  labels: {}

  # This is an extremely advanced option, do not use it without understanding the requirements of the solr.xml you provide.
  providedConfigMap: ""

I have enabled basic authentication for my solr cloud and disabled solr exporter

Now I have deployed solr exporter using yaml and not helm for monitoring purpose

The issue I am facing is my exporter pod is giving error of auth failure and is getting crashloopbackoff

Below is my solr exporter yaml which I used

apiVersion: solr.apache.org/v1beta1
kind: SolrPrometheusExporter
metadata:
  name: dev-prom-exporter
  namespace: dev-backend
spec:
  customKubeOptions:
    podOptions:
      resources:
        requests:
          cpu: 1
          memory: 1Gi
  solrReference:
    basicAuthSecret: (I have provided my basic auth secret here)
    cloud:
      name: "My solr cloud name"
  numThreads: 6

The exporter pod logs are like this

at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) ~[solr-solrj-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:42]
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) ~[solr-solrj-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:42]
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) ~[solr-solrj-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:42]
at org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:117) ~[solr-prometheus-exporter-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:35]
at org.apache.solr.prometheus.scraper.SolrCloudScraper.lambda$pingAllCores$1(SolrCloudScraper.java:77) ~[solr-prometheus-exporter-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:35]
at org.apache.solr.prometheus.scraper.SolrScraper.lambda$null$0(SolrScraper.java:86) ~[solr-prometheus-exporter-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:35]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) ~[solr-solrj-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:54:42]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
WARN - 2022-11-10 05:46:00.174; org.apache.solr.prometheus.scraper.SolrScraper; Error occurred during metrics collection => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://experro-new-solrcloud-0.experro-new-solrcloud-headless.dev-backend:8983/solr: Forbidden

The exporter pod is not able to authenticate and also getting CrashLoopBackoff errors

HoustonPutman commented 1 year ago

Can you try with the most recent Solr Operator version? (v0.6.0)

It might be an issue that has already been fixed.

Other than that I would recommend only using the parts of the values.yaml file that you care about. Otherwise the defaults will become stale with new versions and its hard to tell what you are overriding.

vipul-06 commented 1 year ago

Ok let me give a try

vipul-06 commented 1 year ago

Ok @HoustonPutman my exporter is now showing running in the logs as previous error seems to be solved but my pod is getting restart and crashLoopbackoff

` INFO - 2022-12-26 10:52:55.121; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper INFO - 2022-12-26 10:52:55.160; org.apache.solr.common.cloud.ConnectionManager; zkClient has connected INFO - 2022-12-26 10:52:55.160; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper INFO - 2022-12-26 10:52:55.172; org.apache.solr.common.cloud.ZkStateReader; Updated live nodes from ZooKeeper... (0) -> (5) INFO - 2022-12-26 10:52:55.192; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at my-solr-solrcloud-zookeeper-0.my-solr-new-solrcloud-zookeeper-headless.dev-backend.svc.cluster.local:2181,my-solr-new-solrcloud-zookeeper-1.my-solr-new-solrcloud-zookeeper-headless.dev-backend.svc.cluster.local:2181,my-solr-new-solrcloud-zookeeper-2.my-solr-new-solrcloud-zookeeper-headless.dev-backend.svc.cluster.local:2181,my-solr-new-solrcloud-zookeeper-3.my-solr-new-solrcloud-zookeeper-headless.dev-backend.svc.cluster.local:2181,my-solr-new-solrcloud-zookeeper-4.my-solr-new-solrcloud-zookeeper-headless.dev-backend.svc.cluster.local:2181 ready INFO - 2022-12-26 10:52:55.206; org.apache.solr.prometheus.exporter.SolrExporter; Starting Solr Prometheus Exporting INFO - 2022-12-26 10:52:55.208; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection INFO - 2022-12-26 10:52:55.231; org.apache.solr.prometheus.exporter.SolrExporter; Solr Prometheus Exporter is running

` NAME READY STATUS RESTARTS dev-prom-exporter-solr-metrics-65577bfcc5-s6d2b 1/1 Running 6 (2m11s ago)

As there is nothing in the logs how can I find what is the issue regarding this?

kbarnesMCC commented 2 months ago

@vipul-06 did you ever get a resolution to this? I'm experiencing the exact same issue in your latest update...everything seems to be working but ultimately a crashloopbackoff.

TravisFarrellMCC commented 2 months ago

@HoustonPutman @vipul-06 Any tips on how to stop the metrics pod from crashing? We upgraded Solr Operator to 0.8.1 as there was a fix listed for the Promethius exporter, but that did not seem to help.

kbarnesMCC commented 2 months ago

Just following up to this; it looks like my issue was caused by starving the exporter of compute resources. Upped to 2vcpu/4GB memory and things worked fine with 0.8.1. My guess is the readiness probe was triggering before the slow CPU-based initial gathering of metrics had completed.