fornax-navo / fornax-demo-notebooks

Demo notebooks for the Fornax project
https://fornax-navo.github.io/fornax-demo-notebooks/
BSD 3-Clause "New" or "Revised" License
7 stars 19 forks source link

Package conflict with pandas parqet engine in astrophysics default image #255

Open brianppowell opened 3 months ago

brianppowell commented 3 months ago

From the astrophysics default image running the light curve classifier notebook, there is an import that conflicts with pandas such that it can't read parquet files. This same error does not happen when starting with the astrophysics default image from scratch without the list of imports. Error below:

`--------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In[3], line 9 5 gdd.download_file_from_google_drive(file_id='13RiPODiz2kI8j1OKpP1vfh6ByIUNsKEz', 6 dest_path='./data/df_lc_458sample.parquet', 7 unzip=True) 8 import pandas as pd ----> 9 df_lc = pd.read_parquet("./data/df_lc_458sample.parquet") 11 #get rid of indices set in the light curve code and reset them as needed before sktime algorithms 12 df_lc = df_lc.reset_index()

File /opt/conda/lib/python3.10/site-packages/pandas/io/parquet.py:654, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, kwargs) 501 @doc(storage_options=_shared_docs["storage_options"]) 502 def read_parquet( 503 path: FilePath | ReadBuffer[bytes], (...) 511 kwargs, 512 ) -> DataFrame: 513 """ 514 Load a parquet object from the file path, returning a DataFrame. 515 (...) 651 1 4 9 652 """ --> 654 impl = get_engine(engine) 656 if use_nullable_dtypes is not lib.no_default: 657 msg = ( 658 "The argument 'use_nullable_dtypes' is deprecated and will be removed " 659 "in a future version." 660 )

File /opt/conda/lib/python3.10/site-packages/pandas/io/parquet.py:66, in get_engine(engine) 63 except ImportError as err: 64 error_msgs += "\n - " + str(err) ---> 66 raise ImportError( 67 "Unable to find a usable engine; " 68 "tried using: 'pyarrow', 'fastparquet'.\n" 69 "A suitable version of " 70 "pyarrow or fastparquet is required for parquet " 71 "support.\n" 72 "Trying to import the above resulted in these errors:" 73 f"{error_msgs}" 74 ) 76 if engine == "pyarrow": 77 return PyArrowImpl()

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. A suitable version of pyarrow or fastparquet is required for parquet support. Trying to import the above resulted in these errors:

troyraen commented 3 months ago

Hi @brianppowell, thanks for reporting. I was unable to reproduce this error using the astrophysics default image. The error seems to indicate that neither pyarrow nor fastparquet are installed. Both kernels that are available for me (root and science_demo) have pyarrow installed and were able to execute the relevant notebook cells. Do you know which kernel you were using to run the notebook?

brianppowell commented 3 months ago

Thanks for following up, @troyraen. Here is the kernel info:

(science_demo) jovyan@jupyter-bpowel:~$ cat /proc/version Linux version 5.10.184-175.749.amzn2.x86_64 (mockbuild@ip-10-0-46-190) (gcc10-gcc (GCC) 10.4.1 20221124 (Red Hat 10.4.0-1), GNU ld version 2.35.2-9.amzn2.0.1) #1 SMP Wed Jul 12 18:40:28 UTC 2023

troyraen commented 3 months ago

Thanks, from that I see that you're in the science_demo environment and working on the command line. Are you copying code out of the notebook and running it on the command line? If so, that's fine and should also work (I often do it this way), but now it's not clear to me what steps you're taking that result in this error. I'm also not sure what you mean by this line:

This same error does not happen when starting with the astrophysics default image from scratch without the list of imports.

Can you post a reproducible example with all of your steps?