databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.33k stars 356 forks source link

pyspark is not required when install koalas #2221

Open bingwork opened 2 years ago

bingwork commented 2 years ago

koalas is a great package.

when I install the package, all requirements are as below: pip install koalas==1.8.2 Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting koalas==1.8.2 Using cached https://pypi.tuna.tsinghua.edu.cn/packages/28/9a/d69cf12ea62116873b427e5843be8ae8431b18f2a0714d6f4eec3ee4cda6/koalas-1.8.2-py3-none-any.whl (390 kB) Requirement already satisfied: numpy>=1.14 in /Users/celential-bing/.pyenv/versions/3.8.12/envs/time_machine/lib/python3.8/site-packages (from koalas==1.8.2) (1.21.5) Requirement already satisfied: pandas>=0.23.2 in /Users/celential-bing/.pyenv/versions/3.8.12/envs/time_machine/lib/python3.8/site-packages (from koalas==1.8.2) (1.3.5) Requirement already satisfied: pyarrow>=0.10 in /Users/celential-bing/.pyenv/versions/3.8.12/envs/time_machine/lib/python3.8/site-packages (from koalas==1.8.2) (7.0.0) Requirement already satisfied: pytz>=2017.3 in /Users/celential-bing/.pyenv/versions/3.8.12/envs/time_machine/lib/python3.8/site-packages (from pandas>=0.23.2->koalas==1.8.2) (2021.1) Requirement already satisfied: python-dateutil>=2.7.3 in /Users/celential-bing/.pyenv/versions/3.8.12/envs/time_machine/lib/python3.8/site-packages (from pandas>=0.23.2->koalas==1.8.2) (2.8.2) Requirement already satisfied: six>=1.5 in /Users/celential-bing/.pyenv/versions/3.8.12/envs/time_machine/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas>=0.23.2->koalas==1.8.2) (1.16.0) Installing collected packages: koalas Successfully installed koalas-1.8.2

but it also needs pyspark, for example when I start a service: `ImportError: Unable to import pyspark - consider doing a pip install with [spark] extra to install pyspark with pip Traceback (most recent call last): File "/Users/celential-bing/.pyenv/versions/time_machine/lib/python3.8/site-packages/databricks/koalas/init.py", line 49, in assert_pyspark_version import pyspark ModuleNotFoundError: No module named 'pyspark'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/celential-bing/time-machine/timemachine/app.py", line 1, in from timemachine import app, dapp File "/Users/celential-bing/time-machine/timemachine/init.py", line 301, in app, dapp, schema = create_app() File "/Users/celential-bing/time-machine/timemachine/init.py", line 57, in create_app raise e File "/Users/celential-bing/time-machine/timemachine/init.py", line 54, in create_app return TimeMachineInitializer(app).init_app() File "/Users/celential-bing/time-machine/timemachine/init.py", line 212, in init_app self.init_app_in_ctx() File "/Users/celential-bing/time-machine/timemachine/init.py", line 176, in init_app_in_ctx self.init_views() File "/Users/celential-bing/time-machine/timemachine/init.py", line 69, in init_views from timemachine.views.base import ( File "/Users/celential-bing/time-machine/timemachine/views/base.py", line 13, in from timemachine.models.base import Module, Lambda File "/Users/celential-bing/time-machine/timemachine/models/base.py", line 14, in from timemachine.engines import current_engine, DF File "/Users/celential-bing/time-machine/timemachine/engines/init.py", line 9, in from databricks.koalas import DataFrame as SparkDataFrame File "/Users/celential-bing/.pyenv/versions/time_machine/lib/python3.8/site-packages/databricks/koalas/init.py", line 72, in assert_pyspark_version() File "/Users/celential-bing/.pyenv/versions/time_machine/lib/python3.8/site-packages/databricks/koalas/init.py", line 51, in assert_pyspark_version raise ImportError( ImportError: Unable to import pyspark - consider doing a pip install with [spark] extra to install pyspark with pip`

so I suggest adding the pyspark in the requirement.txt. I didn't find the file, so pull an issue.