DataONEorg / dataone-indexer

DataONE Indexer subsystem
Apache License 2.0
0 stars 2 forks source link

resolve Solr pod startup error: Failed to create collection 'dataone_index' due to: collection already exists: dataone_index #54

Open artntek opened 9 months ago

artntek commented 9 months ago

Not critical, since the pod automatically restarts, but should be cleaned up at some point to avoid confusion:

Looks like a race condition where one of the solr pods is trying to create the dataone_index collection when another one has already created it:

kc describe pod/metacatbrooke-solr-0
[...]
  Warning  FailedPostStartHook     48s    kubelet                  Exec lifecycle hook ([/bin/bash -c /solrconfig/config-solr.sh]) for Container "solr" in Pod "metacatbrooke-solr-0_brooke(cd1c3c64-74b6-4bf6-8b74-c9e301e93a36)" failed - error: command '/bin/bash -c /solrconfig/config-solr.sh' exited with 1: solr 19:40:01.60 INFO  ==> Waiting for Zookeeper to be up

ERROR: Failed to create collection 'dataone_index' due to: collection already exists: dataone_index

, message: "Uploading /bitnami/solr/server/solr/configsets/dataone_index/conf for config 
dataone_index to ZooKeeper at metacatbrooke-zookeeper:2181/solr\nRe-using existing configuration 
directory dataone_index\n\x1b[38;5;6msolr \x1b[38;5;5m19:40:01.60 \x1b[0m\x1b[38;5;2mINFO \x1b[0m 
==> Waiting for Zookeeper to be up\n\nERROR: Failed to create collection 'dataone_index' due to: 
collection already exists: dataone_index\n\n"

  Normal  Killing  48s                  kubelet  FailedPostStartHook
mbjones commented 9 months ago

Ah yes, good catch. I meant to come back to that when I was working on the schema creation stuff. The issue is that the n-shards startup at the same time but only the first needs to create the shared collection. They probably all start to create it at roughly the same time, but only one should. In theory, line 55 of solr-config.sh should catch this:

if ! solr_collection_exists "$SOLR_COLLECTION"; then

but it does not seem to. The other issue is that this only needs to be done on a new PVC where SOLR is starting for the first time, and we need to figure out how to do schema upgrades with exsiting index data in place on the PVC. When I was working on this, I also considered doing some of it in the entrypoint.sh script, which is why we override that from the bitnami default, but IIRC I got rid of all of my changes there and we probably shouldn't override it any longer (so we get the proper bitnami script to run). I looked for a long time for a way to configure the bitnami chart to create our collection rather than their default, but never found a way to configure that through bitnami solr chart values.

mbjones commented 9 months ago

See also: https://github.com/bitnami/charts/issues/19184