JahstreetOrg / spark-on-kubernetes-helm

Spark on Kubernetes infrastructure Helm charts repo
Apache License 2.0
198 stars 76 forks source link

File file:/tmp/history-server does not exist when configuring history server #46

Open AndreasDeCrinis opened 3 years ago

AndreasDeCrinis commented 3 years ago

Hi,

we are strugling around with configuring the history server in livy using these env vars:

LIVY_SPARK_EVENT1LOG_ENABLED: {value: "true"}
LIVY_SPARK_EVENT1LOG_DIR: {value: "file:///tmp/history-server"}
LIVY_LIVY_UI_HISTORY0SERVER0URL: {value: "https://historyserver.mycluster.lan"}

after we trigger a job, we see this error message in the driver container: Exception in thread "main" java.io.FileNotFoundException: File file:/tmp/history-server does not exist

does anybody have a clue what we are doing wrong?

BR Andreas

jahstreet commented 3 years ago

Hi, the error you observe signals that there are not such file/directory in your Spark HS container. To make it work by default you should create the NFS PVC with name eg. nfs-pvc in the Spark HS namespace and configure the spark-cluster Helm chart with the following values:

historyserver:
    pvc:
      # to use a file system path for Spark events dir, set 'enablePVC' to true and mention the
      # name of an already created persistent volume claim in existingClaimName.
      # The volume will be mounted on /data in the pod
      enablePVC: true
      existingClaimName: nfs-pvc
      eventsDir: "/"

Then you do not need to override LIVY_SPARK_EVENT1LOG_DIR to make it work.

Alternatively you need to provide the configs to Spark HS so that it could access for instance HDFS compatible file system. For additional details please refer https://github.com/helm/charts/tree/master/stable/spark-history-server docs.

maciekdude commented 3 years ago

Just create it in the underlying image ;)

RUN chmod +x /opt/entrypoint.sh && \ 
    chmod g+w $SPARK_HOME/work-dir && \
    mkdir -p /tmp/spark-events  
jahstreet commented 3 years ago

@maciekdude , then how will Spark containers write history logs to it? You need to have the shared directory to make it work.

maciekdude commented 3 years ago

Executors do not write logs there even on shared FS like hdfs/s3. It's only driver, so if you have problem with the spawning jobs you can always disable evenlogin, get some share storage like s3 or just create this folder ;)

jahstreet commented 3 years ago

If this is the way you are ok to go with then I have no arguments ;)

MBtech commented 3 years ago

I am running into a similar issue. I have created an NFS based PV and PVC and added the following corresponding settings for the historyserver charge:

  pvc:
    enablePVC: true
    existingClaimName: events-dir
    eventsDir: "/"
  nfs:
    enableExampleNFS: false
    pvcName: events-dir
    pvName: events-dir-pv

Which configurations do I need to change for the livy chart? I have changed these two:

  env:
    # Configure History Server log directory to write Spark logs to
    LIVY_SPARK_EVENT1LOG_ENABLED: {value: "true"}
    LIVY_SPARK_EVENT1LOG_DIR: {value: "file:///data"}

Which other configurations do I need to change for the livy chart? Persistence configurations?

MBtech commented 3 years ago

Figured it out. I needed to configure the Kubernetes Volumes configuration properties (in the request for the livy batch job) for Spark driver and executors as mentioned here.

Slashoper commented 5 months ago

Figured it out. I needed to configure the Kubernetes Volumes configuration properties (in the request for the livy batch job) for Spark driver and executors as mentioned here.

how do you reslove it, can i see your submit spark job config?