Closed cometta closed 7 months ago
You can pack your code to a docker image in case of kubernetes or zip in case of yarn and execute it by issuing a http call to the lighter endpoint: https://github.com/exacaster/lighter/blob/master/docs/rest.md#batch
Jupyter does not participate in this in any way. You can start batch applications from any environment or irchestration tool that allows making http requests, like for example Apache Airflow
i see the rest path /lighter/api/batches
, for docker image + k8s. is it just define the image path in spark.*container*image
key ? pyFiles
and files
are optional since i'm not using yarn?
It all depends on the configurations that you started Lighter with, but basically it should be enough to specify spark.kubernetes.container.image
when submitting to k8s.
i put my custom python files in my new spark docker image, when i submit job using batch, i get below error
java.lang.RuntimeException: Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property.se/java.lang.Thread.run(Unknown Source)
but i don't need to upload any files. all python files is inside spark.kubernetes.container.image
. can advice
Please provide more information: payload of request to Lighter or just all parameters that are passed to the Lighter.
issue solved with "file": "local://path/my.file.py
Separately, I wanted to enquire if there's a way for me to upload my Python file into an s3a bucket and submit a Spark batch that automatically pulls and runs the Python file, negating the need to rebuild the Docker image?
It's possible but not recommended. Anything you specify in archives will be downloaded to the nodes. if you specify s3a:// prefix - it should work.
on the lighter ui, I can see a tab call "batch". Can you share any documentation on how to use this feature? Is the job run through jupyter notebook interface or through terminal pyspark code ?