canonical / spark-k8s-bundle

Charmed Spark K8s bundle, for making it seamless to operate Spark on K8s
Apache License 2.0
3 stars 4 forks source link

Deploying the Spark Bundle fails due to errors in History Server #53

Open theoctober19th opened 3 weeks ago

theoctober19th commented 3 weeks ago

Steps to reproduce

  1. Deploy the Spark K8s bundle with cos-integration overlay.
  2. Observe the status of Spark History Server. The charm fails with the following error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-history-server-0/charm/./src/charm.py", line 57, in <module>
    main(SparkHistoryServerCharm)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/main.py", line 553, in main
    manager.run()
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/main.py", line 529, in run
    self._emit()
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/main.py", line 518, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name, self._juju_context)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/main.py", line 139, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/framework.py", line 347, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-history-server-0/charm/lib/charms/data_platform_libs/v0/object_storage.py", line 160, in _on_relation_changed_event
    getattr(self.on, "storage_connection_info_changed").emit(
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/framework.py", line 347, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-history-server-0/charm/src/events/base.py", line 66, in wrapper_hook
    res = hook(event_handler, event)
  File "/var/lib/juju/agents/unit-history-server-0/charm/src/events/base.py", line 96, in wrapper_hook
    return hook(event_handler, event)
  File "/var/lib/juju/agents/unit-history-server-0/charm/src/events/azure_storage.py", line 50, in _on_azure_storage_connection_info_changed
    self.history_server.update(
  File "/var/lib/juju/agents/unit-history-server-0/charm/src/managers/history_server.py", line 183, in update
    self.workload.start()
  File "/var/lib/juju/agents/unit-history-server-0/charm/src/workload.py", line 89, in start
    self.container.restart(self.HISTORY_SERVER_SERVICE)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/model.py", line 2293, in restart
    self._pebble.restart_services(service_names)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/pebble.py", line 2203, in restart_services
    return self._services_action('restart', services, timeout, delay)
  File "/var/lib/juju/agents/unit-history-server-0/charm/venv/ops/pebble.py", line 2228, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "history-server" (cannot start service: exited quickly with code 134)
----- Logs from task 0 -----
2024-10-17T07:13:48Z INFO Service "history-server" has never been started.
----- Logs from task 1 -----
2024-10-17T07:13:48Z INFO Most recent service output:
    (...)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:491)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:503)
    Caused by: java.lang.NoSuchMethodError: 'io.prometheus.jmx.shaded.io.prometheus.client.Collector io.prometheus.jmx.BuildInfoCollector.register()'
        at io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:54)
        ... 6 more
    *** java.lang.instrument ASSERTION FAILED ***: "!errorOutstanding" with message Outstanding error when calling method in invokeJavaAgentMainMethod at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 619
    *** java.lang.instrument ASSERTION FAILED ***: "success" with message invokeJavaAgentMainMethod failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 459
    *** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
    FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed
    Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
    V  [libjvm.so+0x8c642e]  jni_FatalError+0xbe
    V  [libjvm.so+0xa4492f]  JvmtiExport::post_vm_initialized()+0xcf
    V  [libjvm.so+0xeb9e41]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x921
    V  [libjvm.so+0x8e02c5]  JNI_CreateJavaVM+0x55
    C  [libjli.so+0x3d43]  JavaMain+0x93
    C  [libjli.so+0x7fbd]  ThreadJavaMain+0xd

    /opt/spark/sbin/spark-daemon.sh: line 133:   291 Aborted                 (core dumped) "$@"
2024-10-17T07:13:48Z ERROR cannot start service: exited quickly with code 134

Expected behavior

The bundle should work without issues.

Actual behavior

The bundle deployment fails due to errors in History Server

Versions

Operating system: 22.04

Log output

Additional context

The error appeared recently and I have suspicions that the addition of JMX exporter in the rock image has something to do with the Java dependencies.

syncronize-issues-to-jira[bot] commented 3 weeks ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5679.

This message was autogenerated