-
Currently, hierarchicalforecast only supports a pandas dataframe as input. For the library to scale horizontally, we need to explore different alternatives on how to integrate frameworks such as spark…
-
I'm working on a [library](https://github.com/tokoko/subframe) that lets you build up substrait plans using dataframe API. Over the course of dataframe transformations various unrelated plans need to …
-
The current `RetrieverCache` implementation calls the `transform()` function of the wrapped `Transformer` once for each query in the input data frame, which can be costly for some retrieval models due…
-
Hi folks, I'm Devin from the Modin group.
Between our groups, I believe there's a high potential for collaboration and improving the data science experience for Dask and Modin users. Due to the com…
-
Hello,
I was looking at your code, and the results look promising. When trying to run it myself, I noticed that you are referencing a class "BTCCrawl_To_DataFrame_Class", which I cannot find. Pleas…
-
Points now use lazy dataframes (Dask `DataFrame`). We talked about allowing having in-memory both as dataframes and lazy dataframes. https://github.com/scverse/spatialdata/issues/153
What about using…
-
### Before
_No response_
### After
_No response_
### Context for the issue:
With the rise in popularity of packages such as polars, arrow, and others PyMC's dependence on pandas is looking less a…
-
Needs some investigation, but I think we have a feasible path to replacing pyarrow with arro3 internally. The only thing we use pyarrow for is to create a recordbatchreader and convert that into the …
-
_Originally raised at https://github.com/JuliaData/DataFrames.jl/issues/3444 but moving here as it seems to be an upstream issue._
Hi folks.
I recently gave a workshop where I was extolling the …
-
### What happens?
`duckdb` and arrow seem to write parquet files at roughly the same speed until the data gets to about 10+ GB, at which point duckdb is about an order of magnitude slower.
The i…