JahstreetOrg / spark-on-kubernetes-helm

Spark on Kubernetes infrastructure Helm charts repo
Apache License 2.0
198 stars 76 forks source link

exec: "curl": executable file not found in $PATH: unknown #71

Open thuybt opened 2 years ago

thuybt commented 2 years ago

How can I config this with airflow for automation, Furthermore, I have some issue with curl when exec post command

PardhuMadipalli commented 2 years ago

I am facing this issue too.

teplydat commented 2 years ago

"Furthermore, I have some issue with curl when exec post command" I have the same issue. So basically the example "Run Spark Job" (https://github.com/JahstreetOrg/spark-on-kubernetes-helm#run-spark-job) does not work.

I tried to install curl but I got errors like:

apt update

E: Repository 'https://deb.debian.org/debian buster InRelease' changed its 'Suite' value from 'stable' to 'oldstable'
apt install curl -y

E: Failed to fetch https://deb.debian.org/debian/pool/main/o/openldap/libldap-common_2.4.47+dfsg-3+deb10u2_all.deb  404  Not Found [IP: 151.101.14.132 443]

I already have problems to install curl in the first base image sasnouskikh/spark:3.0.1_2.12-hadoop_3.2.0_cloud somehow apt is locked already:

RUN apt-get update -y

E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)

#solved by
USER root
#but then same error like above

@jahstreet What's wrong and how can I fix that? I mean what's the point of giving a curl example if curl is not installed in livy container...

@jahstreet also another question: is that Livy compatible with Spark 3.3.1?

Any hints are welcome :)

teplydat commented 2 years ago

Okay, I found a workaround regarding that curl problem:

kubectl exec -it --namespace livy livy-0 -- bash
apt update --allow-releaseinfo-change
apt install curl -y
curl --version

or permanent via Dockerfile:

FROM sasnouskikh/livy:0.8.0-incubating-spark_3.0.1_2.12-hadoop_3.2.0_cloud

RUN apt-get update --allow-releaseinfo-change
RUN apt install curl -y
jahstreet commented 2 years ago

Hi guys, sorry for the delayed responses...

How can I config this with airflow for automation

Some time ago I've written about that in Stackoverflow. Unless the new Airflow operators has been released since then it still should be actual info.

I have some issue with curl when exec post command Okay, I found a workaround regarding that curl problem

Thx for checking. Could you please suggest the PR with the fix? Would really appreciate the contribution.

also another question: is that Livy compatible with Spark 3.3.1?

I'm not sure, probably not fully and we would need to rebuild it with the proper support of Spark 3.x. Going to give it a closer look in the coming months since getting back to the project related activities.

teplydat commented 2 years ago

Thx for checking. Could you please suggest the PR with the fix? Would really appreciate the contribution. @jahstreet Sorry for the delay. Here is the PR to fix it: https://github.com/JahstreetOrg/spark-on-kubernetes-docker/pull/19

lhoss commented 2 years ago

also another question: is that Livy compatible with Spark 3.3.1?

I guess @teplydat meant Spark 3.1.1. Ourselves would also prefer to have a build using 3.1.1 (same version currently used in the spark k8s operator, we would like to integrate with @jahstreet 's Livy deployment)