bitnami / containers

Bitnami container images
https://bitnami.com
Other
3.45k stars 4.91k forks source link

[bitnami/spark] unable to start pyspark shell #38139

Open kaoutaar opened 1 year ago

kaoutaar commented 1 year ago

Name and Version

bitnami/spark:3.4.0

What architecture are you using?

None

What steps will reproduce the bug?

pyspark command is supposed to open the pyspark shell, but it keeps returning Error: pyspark does not support any application options. even there is no argument added

Are you using any custom parameters or values?

No response

What is the expected behavior?

No response

What do you see instead?

Error: pyspark does not support any application options.

Additional information

No response

prmoore77 commented 1 year ago

It seems to be the --name argument that is causing the issue in script: /opt/bitnami/spark/bin/pyspark - line 68:

exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"

When I run the steps of that script manually without the --name arg, I can get an interactive PySpark shell:

export PYTHONPATH=/opt/bitnami/spark/python/lib/py4j-0.10.9.7-src.zip:/opt/bitnami/spark/python/:/opt/bitnami/spark/python/:
export PYTHONSTARTUP=/opt/bitnami/spark/python/pyspark/shell.py
exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main

The official pyspark file in the spark GitHub repo (https://github.com/apache/spark/blob/ac1e22231055d7e59eec5dd8c6a807252aab8b7f/bin/pyspark#L68) - also has this, however - so it is a bit confusing. Is this a Spark bug, or a Bitnami image one?

prmoore77 commented 1 year ago

It looks like this line in the official Spark GitHub repo raises the error.

It says that args to PySpark should be set in the "PYSPARK_SUBMIT_ARGS" environment variable. This leads me to believe this is a Spark bug as the /bin/pyspark script is definitely not setting that variable - it just appends the args to the spark-submit line...

github-actions[bot] commented 1 year ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 1 year ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

mikegong commented 1 year ago

I also got this issue.

syshriki9 commented 1 year ago

This still happens for me in the latest release as of today.

fevisera commented 1 year ago

Hi,

Sorry for the delay. I could reproduce the error:

$ docker run --rm -it bitnami/spark sh
$ pyshark
Error: pyspark does not support any application options.
...

The pyshark does not fail for the upstream apache/spark container. I will move forward with our engineering team, and as soon as there is news we will update this ticket.

Thanks for bringing up this issue.

rennsax commented 11 months ago

Also encountered with this issue. Is there any workaround to deal with it at this moment?

spaily commented 11 months ago

I'm also encountering with this issue with Spark 3.5.0

vdittgen commented 11 months ago

Same issue here.

dermoritz commented 9 months ago

same issue here - any news? or workaround?

rennsax commented 9 months ago

@dermoritz My simple resolution is to modify /opt/bitnami/spark/bin/pyspark:

$ diff --color ./pyspark.old ./pyspark
68c68
< exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"
---
> exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main "$@"

And it works right to me.

dermoritz commented 9 months ago

thx @rennsax but i have problem to edit the file: no vi or anything is in the image and i can not install anything: apt-get not working seems to need root "sudo": "bash: sudo: command not found"

can you please give hint how to change the file within bitnami container?

rennsax commented 9 months ago

@dermoritz Try docker exec --user=root -it <container_name> bash and then edit it directly. Or you can copy the file to your host via docker cp, edit it, and then copy it back.

dermoritz commented 9 months ago

@rennsax thanks --user=root helped. for anyone who comes here. "pyspark" would not start as user root. to start it you need to relog without "--user=root"

frankvier commented 9 months ago

The version above 3.2 has the same problem. This not going to debug soon. Thanks core team for your effort.

suryachereddy commented 8 months ago

@rennsax thank you. It works after modifying /opt/bitnami/spark/bin/pyspark. It would be great if someone can fix it.

WolfHero commented 8 months ago

pretty good! thanks @rennsax

leoYY commented 5 days ago

Hi, I found that the difference with spark office image is the default parameter --driver-java-options "--add-exports java.base/sun.nio.ch=ALL-UNNAMED" being introduced in spark-submit before the PySpark command arguments. I tried removing this parameter or moving it after "$@", it works.