canonical / spark-history-server-k8s-operator

This repository is for the Charmed Spark History Server operator to be deployed with juju
Apache License 2.0
0 stars 2 forks source link

History-server not runnnig #49

Open natalytvinova opened 1 week ago

natalytvinova commented 1 week ago

Steps to reproduce:

$ jinja2 -D service_account=spark-service-account -D namespace=spark -D storage_account=<storageaccount> -D container=spark-dev-storage bundle-azure-storage.yaml.j2 > bundle-azure-storage.yaml
$ juju add-model spark aks-dev
$ juju deploy ./bundle-azure-storage.yaml --trust -m spark
$ juju add-secret azure-credentials secret-key=<secret-key-from-azure>
$ juju grant-secret <secret_id> azure-storage
$ juju config azure-storage credentials=secret:<secret_id> 
$ juju resolved azure-storage/0
$ juju deploy traefik-k8s --channel latest/stable --trust
$ juju relate traefik-k8s history-server

Expected behavior

History-server running

Actual behavior

The history-server goes up and down every 2-3 mins: history-server/0* blocked idle 10.244.9.75 History server not running. Please check logs.

Versions

Operating system: Jammy Juju CLI: 3.5.3 Juju agent: 3.5.3 Charm revision: 3.4/edge K8s: AKS 1.29

Log output

Juju debug log for history-server: history-server-log.txt Juju debug-log for the whole model: spark-model-log.txt

syncronize-issues-to-jira[bot] commented 1 week ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5410.

This message was autogenerated

natalytvinova commented 1 week ago
juju ssh --container spark-history-server history-server/0

showed that Caused by: Operation failed: "The specified filesystem does not exist.", 404, HEAD, which revealed that the bucket was not created

deusebio commented 1 week ago

@natalytvinova Thanks for submitting the issue. I would keep this issue open, since the charm was not really providing good information about this, which sounds a good point of improvement, and we could also provide some logic to auto-create the bucket.

I believe we should:

  1. Check if the bucket exists after the relation is created.
  2. If it does not exist, try to create it using the credentials
  3. if we can't create (because of permission), go into a blocked status

I don't think it should be too hard. Hopefully we can try to fit this in one of the next pulses