apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
242 stars 112 forks source link

Issue with basic auth #672

Open sgowda97 opened 5 months ago

sgowda97 commented 5 months ago

Hi,

I am running solr-operator 0.8.0, solr 9.4.0 on GKE. Here is my YAML file

apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
  name: dev
spec:
  solrAddressability:
      commonServicePort: 80
      external:
        domainName: HOST
        method: Ingress
        nodePortOverride: 80
        useExternalAddress: false
      podPort: 8983
  customSolrKubeOptions:
    podOptions:
      resources:
        limits:
          memory: 512Mi
          cpu: 500m
        requests:
          cpu: 500m
          memory: 256Mi
  dataStorage:
    persistent:
      pvcTemplate:
        spec:
          resources:
            requests:
              storage: 10Gi
      reclaimPolicy: Delete
  replicas: 2
  solrImage:
    repository: solr
    tag: "9.4.0"
  solrJavaMem: -Xms256M -Xmx256M
  updateStrategy:
    method: StatefulSet
  zookeeperRef:
    provided:
      chroot: /dev
      image:
        pullPolicy: IfNotPresent
        repository: pravega/zookeeper
        tag: 0.2.8
      persistence:
        reclaimPolicy: Delete
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
      replicas: 1
      zookeeperPodPolicy:
        resources:
          limits:
            memory: 256Mi
            cpu: 500m
          requests:
            cpu: 500m
            memory: 128Mi
  solrSecurity:
      authenticationType: Basic

The pods keep crashing. To inspect, I logged one of the pod. This is the error I am getting;

$ ksolr logs pod/dev-solrcloud-1 -f
Defaulted container "solrcloud-node" out of: solrcloud-node, cp-solr-xml (init), setup-zk (init)
Starting Solr
Java 17 detected. Enabled workaround for SOLR-16463
[0.002s][warning][pagesize] UseLargePages disabled, no large pages configured and available on the system.
CompileCommand: exclude com/github/benmanes/caffeine/cache/BoundedLocalCache.put bool exclude = true
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
2024-01-03 07:27:01.682 INFO  (main) [] o.e.j.s.Server jetty-10.0.17; built: 2023-10-09T18:22:21.150Z; git: af15f12297adf5c5083e1f2f8f4c5974438bca25; jvm 17.0.9+9
2024-01-03 07:27:04.075 WARN  (main) [] o.e.j.u.DeprecationWarning Using @Deprecated Class org.eclipse.jetty.servlet.listener.ELContextCleaner
2024-01-03 07:27:04.279 INFO  (main) [] o.a.s.s.CoreContainerProvider Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory
2024-01-03 07:27:04.369 INFO  (main) [] o.a.s.s.CoreContainerProvider  ___      _       Welcome to Apache Solr™ version 9.4.0
2024-01-03 07:27:04.370 INFO  (main) [] o.a.s.s.CoreContainerProvider / __| ___| |_ _   Starting in cloud mode on port 8983
2024-01-03 07:27:04.370 INFO  (main) [] o.a.s.s.CoreContainerProvider \__ \/ _ \ | '_|  Install dir: /opt/solr-9.4.0
2024-01-03 07:27:04.370 INFO  (main) [] o.a.s.s.CoreContainerProvider |___/\___/_|_|    Start time: 2024-01-03T07:27:04.370560543Z
2024-01-03 07:27:04.376 INFO  (main) [] o.a.s.s.CoreContainerProvider Solr started with "-XX:+CrashOnOutOfMemoryError" that will crash on any OutOfMemoryError exception. The cause of the OOME will be logged in the crash file at the following path: /var/solr/logs/jvm_crash_13.log
2024-01-03 07:27:04.376 INFO  (main) [] o.a.s.s.CoreContainerProvider Log level override, property solr.log.level=INFO
2024-01-03 07:27:04.570 INFO  (main) [] o.a.s.s.CoreContainerProvider Solr Home: /var/solr/data (source: system property: solr.solr.home)
2024-01-03 07:27:04.687 WARN  (main) [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
2024-01-03 07:27:04.967 INFO  (main) [] o.a.s.c.c.ConnectionManager Waiting up to 30000ms for client to connect to ZooKeeper
2024-01-03 07:27:05.067 INFO  (zkConnectionManagerCallback-2-thread-1) [] o.a.s.c.c.ConnectionManager zkClient has connected
2024-01-03 07:27:05.069 INFO  (main) [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
2024-01-03 07:27:05.069 WARN  (main) [] o.a.s.c.c.SolrZkClient Using default ZkACLProvider. DefaultZkACLProvider is not secure, it creates 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
2024-01-03 07:27:05.188 INFO  (main) [] o.a.s.c.NodeConfig Loading solr.xml from SolrHome (not found in ZooKeeper)
2024-01-03 07:27:05.189 INFO  (main) [] o.a.s.c.SolrXmlConfig Loading solr.xml from /var/solr/data/solr.xml
2024-01-03 07:27:05.387 INFO  (main) [] o.a.s.c.SolrResourceLoader Added 1 libs to classloader, from paths: [/opt/solr-9.4.0/lib]
2024-01-03 07:27:10.179 INFO  (main) [] o.a.s.u.t.SimplePropagator Always-on trace id generation enabled.
2024-01-03 07:27:11.977 WARN  (main) [] o.a.s.u.StartupLoggingUtils Jetty request logging enabled. Will retain logs for last 3 days. See chapter "Configuring Logging" in reference guide for how to configure.
2024-01-03 07:27:11.977 INFO  (main) [] o.a.s.c.ZkContainer Zookeeper client=dev-solrcloud-zookeeper-0.dev-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181/dev
2024-01-03 07:27:12.072 WARN  (main) [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
2024-01-03 07:27:12.075 INFO  (main) [] o.a.s.c.c.ConnectionManager Waiting up to 15000ms for client to connect to ZooKeeper
2024-01-03 07:27:12.087 INFO  (zkConnectionManagerCallback-12-thread-1) [] o.a.s.c.c.ConnectionManager zkClient has connected
2024-01-03 07:27:12.088 INFO  (main) [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
2024-01-03 07:27:12.088 WARN  (main) [] o.a.s.c.c.SolrZkClient Using default ZkACLProvider. DefaultZkACLProvider is not secure, it creates 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
2024-01-03 07:27:12.275 INFO  (main) [] o.a.s.c.DistributedClusterStateUpdater Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr will be using Overseer based cluster state updates.
2024-01-03 07:27:12.288 INFO  (main) [] o.a.s.c.c.ConnectionManager Waiting up to 15000ms for client to connect to ZooKeeper
2024-01-03 07:27:12.366 INFO  (zkConnectionManagerCallback-14-thread-1) [] o.a.s.c.c.ConnectionManager zkClient has connected
2024-01-03 07:27:12.366 INFO  (main) [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
2024-01-03 07:27:12.778 WARN  (main) [] o.a.s.c.ZkController Contents of zookeeper /security.json are world-readable; consider setting up ACLs as described in https://solr.apache.org/guide/solr/latest/deployment-guide/zookeeper-access-control.html
2024-01-03 07:27:12.969 INFO  (main) [] o.a.s.c.DistributedClusterStateUpdater Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr will be using Overseer based cluster state updates.
2024-01-03 07:27:13.067 INFO  (main) [] o.a.s.c.OverseerElectionContext I am going to be the leader dev-solrcloud-1.solr:80_solr
2024-01-03 07:27:13.080 INFO  (main) [] o.a.s.c.Overseer Overseer (id=72063280979050514-dev-solrcloud-1.solr:80_solr-n_0000000000) starting
2024-01-03 07:27:13.585 INFO  (OverseerStateUpdate-72063280979050514-dev-solrcloud-1.solr:80_solr-n_0000000000) [] o.a.s.c.Overseer Starting to work on the main queue : dev-solrcloud-1.solr:80_solr
2024-01-03 07:27:13.668 INFO  (main) [] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/dev-solrcloud-1.solr:80_solr
2024-01-03 07:27:13.770 INFO  (zkCallback-13-thread-1) [] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (0) -> (1)
2024-01-03 07:27:14.079 ERROR (main) [] o.a.s.s.CoreContainerProvider Could not start Solr. Check solr/home property and the logs => java.lang.NullPointerException: Cannot invoke "java.util.Map.get(Object)" because the return value of "org.apache.solr.handler.admin.SecurityConfHandler$SecurityConfig.getData()" is null
    at org.apache.solr.core.CoreContainer.reloadSecurityProperties(CoreContainer.java:1186)
java.lang.NullPointerException: Cannot invoke "java.util.Map.get(Object)" because the return value of "org.apache.solr.handler.admin.SecurityConfHandler$SecurityConfig.getData()" is null
    at org.apache.solr.core.CoreContainer.reloadSecurityProperties(CoreContainer.java:1186) ~[?:?]
    at org.apache.solr.core.CoreContainer.loadInternal(CoreContainer.java:856) ~[?:?]
    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:765) ~[?:?]
    at org.apache.solr.servlet.CoreContainerProvider.createCoreContainer(CoreContainerProvider.java:427) ~[?:?]
    at org.apache.solr.servlet.CoreContainerProvider.init(CoreContainerProvider.java:246) ~[?:?]
    at org.apache.solr.servlet.CoreContainerProvider.contextInitialized(CoreContainerProvider.java:116) ~[?:?]
    at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1049) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:624) ~[jetty-servlet-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:984) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:740) ~[jetty-servlet-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) ~[jetty-servlet-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1304) ~[jetty-webapp-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:901) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) ~[jetty-servlet-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:532) ~[jetty-webapp-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.doStart(GzipHandler.java:221) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.Server.start(Server.java:470) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.server.Server.doStart(Server.java:415) ~[jetty-server-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[jetty-util-10.0.17.jar:10.0.17]
    at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1919) ~[jetty-xml-10.0.17.jar:10.0.17]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
    at org.eclipse.jetty.start.Main.invokeMain(Main.java:229) ~[start.jar:10.0.17]
    at org.eclipse.jetty.start.Main.start(Main.java:528) ~[start.jar:10.0.17]
    at org.eclipse.jetty.start.Main.main(Main.java:76) ~[start.jar:10.0.17]
2024-01-03 07:27:14.171 ERROR (main) [] o.a.s.s.CoreContainerProvider Error processing the request. CoreContainer is either not initialized or shutting down.
2024-01-03 07:27:14.171 ERROR (main) [] o.a.s.s.SolrDispatchFilter Could not start Dispatch Filter. => javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
    at org.apache.solr.servlet.CoreContainerProvider.waitForCoreContainer(CoreContainerProvider.java:156)

This does not happen if I kubectl apply -f yaml.yaml without the basic auth property. One more thing I noticed is, the error miraculously goes away if I just kubectl delete pod/dev-solrcloud-1 since it gets re-created.

HoustonPutman commented 3 months ago

That sounds like a race-condition for bootstrapping the security.json... How strange. This only happens when creating the cloud for the first time, correct?

sgowda97 commented 3 months ago

@HoustonPutman Yes, happens only for the first time. As a workaround, I added the following code to my solr-cloud yaml, it temporarily fixed the issue.

initContainers: # fix for https://github.com/apache/solr-operator/issues/672#issue-2063779008
      - command:
        - sh
        - -c
        - "host=$(printenv | grep ZOOKEEPER_CLIENT_SERVICE_HOST | cut -d= -f2); until echo 'stat' | nc $host 2181 | grep Zookeeper; do echo waiting for zookeeper; sleep 2; done;"
        image: library/busybox:1.28.0-glibc
        imagePullPolicy: IfNotPresent
        name: verify-zookeeper-before-solr