-
There is a lot of complexity around HDF5 flavors used with Pandas. In particular "fixed" format hdf5 files are somewhat opaque and can not be cut easily. Dask.dataframe currently chokes on these.
…
-
I mentioned this in #11067, but maybe this deserves its own issue: I find it difficult to turn off query planning using the Python API. Using `dask.config.set` only works if `dask.dataframe` hasn't be…
-
Problem: Dask ( https://docs.dask.org/en/latest/ ) is a very good parallel/distributed data system that replaces pandas. The people who worked on it, have made it play well with numpy, pandas, scikit…
-
The current navbar has links to pages that I think mostly don't matter.
- distributed docs are way too technical for most users
- dask-ml is not actually all that focused on dask + ml users…
-
Fastparquet does not appear to support writing Dask dataframes with Pandas SparseArray columns. Doing so fails with:
```
AttributeError: 'SparseDtype' object has no attribute 'itemsize'
```
Pandas…
-
**Describe the issue**:
Creating dataframes without dask-expr works, but now with it being the default in the latest release it fails. I don't see migration/documentation on what the behavior changes…
-
I use dask in my project and recently ran into a weird problem - I need differentiate between dask and pandas dataframes in some places so I use the following **if**
```
import dask.datafram…
-
From @rabernat on [Twitter](https://twitter.com/rabernat/status/1330707155742322689):
> "Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they…
-
Hi,
Thank you for dask ! 🙏
**Describe the issue**:
Following this [discussion on discord](https://dask.discourse.group/t/read-parquet-filters-not-working-with-query-optimizer/2912), and a…
-
**Is your feature request related to a problem? Please describe.**
NeMo curator supports document datasets as dataframes today and includes some helpers to read from json/parquet files.
**Describe…