delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.56k stars 1.7k forks source link

delta-spark installation issues #3607

Open pkirangi opened 2 months ago

pkirangi commented 2 months ago

I am installing Apache SPARK for the first time. I was able to install the docker container.

I ran the following command:

docker pull apache/spark and then started the container using

docker run -it apache/spark /bin/sh

Now I am trying to install the delta spark package.

Used the following command

pip install delta-spark

However I am getting the following error: ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/home/spark' Check the permissions.

tonkolviktor commented 1 month ago

I'm not from delta. This error hasnothing to do with spark/delta itself, so you'll not get much help here. (At the same time you provided very little info to get any actual help)

Probably you should get to know a bit unix permission management (maybe first without docker's added complexity) via pages like: https://www.redhat.com/sysadmin/linux-file-permissions-explained One you understand all that you should have the knowledge and tools to fix problems like this without any help ;)

Cheers, V

newfront commented 2 weeks ago

@pkirangi if you want to use the official docker (https://hub.docker.com/r/deltaio/delta-docker) then you just need to use docker pull deltaio/delta-docker. This will get you up and running without needing to install delta-spark yourself.

You can read through the full readme, or if you want to just get started with a notebook (https://github.com/delta-io/delta-docker?tab=readme-ov-file#jupyterlab-notebook) is the easiest way.

Additionally, take a look at https://github.com/newfront/hitchhikers_guide_to_deltalake_streaming if you want to learn more about Delta from the jupyter environment. Cheers