conda-incubator / condastats

Other
9 stars 7 forks source link

condastats is broken - ValueError: ArrowStringArray requires a PyArrow (chunked) array of large_string type #24

Open Scaramir opened 1 month ago

Scaramir commented 1 month ago

When trying to run it, i get the following error:

> condastats overall pandas
Traceback (most recent call last):
  File "C:\Anaconda3\Scripts\condastats-script.py", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "C:\Anaconda3\Lib\site-packages\condastats\cli.py", line 387, in main
    overall(
  File "C:\Anaconda3\Lib\site-packages\condastats\cli.py", line 62, in overall
    df = dd.read_parquet(
         ^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\dask_expr\_collection.py", line 5433, in read_parquet
    ReadParquetFSSpec(
  File "C:\Anaconda3\Lib\site-packages\dask_expr\_core.py", line 57, in __new__
    _name = inst._name
            ^^^^^^^^^^
  File "C:\Anaconda3\Lib\functools.py", line 995, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\dask_expr\io\parquet.py", line 776, in _name
    funcname(type(self)), self.checksum, *self.operands[:-1]
                          ^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\dask_expr\io\parquet.py", line 782, in checksum
    return self._dataset_info["checksum"]
           ^^^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\dask_expr\io\parquet.py", line 1375, in _dataset_info
    meta = self.engine._create_dd_meta(dataset_info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\dask\dataframe\io\parquet\arrow.py", line 1215, in _create_dd_meta
    meta = cls._arrow_table_to_pandas(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\dask\dataframe\io\parquet\arrow.py", line 1878, in _arrow_table_to_pandas
    res = arrow_table.to_pandas(categories=categories, **_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\\array.pxi", line 885, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow\\table.pxi", line 5002, in pyarrow.lib.Table._to_pandas
  File "C:\Anaconda3\Lib\site-packages\pyarrow\pandas_compat.py", line 801, in table_to_dataframe
    _reconstruct_block(item, column_names, ext_columns_dtypes)
  File "C:\Anaconda3\Lib\site-packages\pyarrow\pandas_compat.py", line 743, in _reconstruct_block
    arr = pandas_dtype.__from_arrow__(arr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\pandas\core\arrays\string_.py", line 217, in __from_arrow__
    return ArrowStringArray(array)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Anaconda3\Lib\site-packages\pandas\core\arrays\string_arrow.py", line 143, in __init__
    raise ValueError(
ValueError: ArrowStringArray requires a PyArrow (chunked) array of large_string type
florian-wagner commented 1 month ago

I can confirm that I have the same problem. It worked for me after a conda install "numpy<2" "pandas<2".