IntelPython / sdc

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler
https://intelpython.github.io/sdc-doc/
BSD 2-Clause "Simplified" License
645 stars 61 forks source link

Unable to read parquet using nopython option #1006

Open serdinskyj opened 11 months ago

serdinskyj commented 11 months ago

I noticed that there is a good amount of code pointing to a read_parquet implementation available with the Pandas API, but I am seeming to have some trouble with it. Is this something that is supported or is the package limited to read_csv as mentioned in the documentation?

I first received the error stating that I must have pyarrow or fastparquet to run the read_parquet function, so I decided on fastparquet since the installation instructions already put pyarrow into the intel sdc conda environment.

Now as I attempt to run with nopython, I am met with this compiler error:

unknown attribute 'read_parquet' of type module(<module 'pandas' from '/home/jds35172/anaconda3/envs/intel-sdc-env/lib/python3.7/site-packages/pandas/init.py'>)

Do I have to use the Makefiles to integrate this into the environment, or are there any alternatives to simply resorting to nopython?

Thank you!

AlexanderKalistratov commented 11 months ago

@serdinskyj this project is not maintained. You could consider https://github.com/modin-project/modin as an option.