-
**Minimal Code To Reproduce**
```
from fugue import transform
from dask.distributed import Client
client = Client() # without this, dask is not in distributed mode
from fugue_dask import DaskEx…
-
Just raising an issue to ask if this is still in use and still makes sense to use, sorry to bother
-
Currently, RAPIDS deploys a forked version of this repository with additions that support a few things:
1. New Random Forests (RF) interface in XGBoost
2. Ingest of `dask-cudf` objects
3. Ingest …
-
**What happened**: When loading a Parquet file, I specified a column twice in the "columns=" argument and the column was loaded twice, that is, there were two columns in the resulting DataFrame with t…
-
**What happened**:
Using pivot_table to try and convert a series of large vertically stored CSV files into a "wide" table, I've been running into memory limitation issues with pandas, so I turned t…
-
# Motivation
Shuffles are an integral part of many distributed data manipulation algorithms. Common DataFrame operations relying on shuffling include `sort`, `merge`, `set_index`, or various groupb…
-
[`pandas.DataFrame.quantile`](https://pandas.pydata.org/docs/dev/whatsnew/v1.5.0.html#other-deprecations) is switching it's default for `numeric_only` from `True` to `False`. In particular, this chang…
-
the `timestampdiff` operation currently fails to do missing `reinterpret` implementation error.
Here's a reproducer:
```python
from dask_sql import Context
from dask import dataframe as dd
impo…
-
On the call yesterday, the topic of mutability came up in the vaex demo.
The short version is that it may be difficult or impossible for some systems to implement inplace mutation of dataframes. Fo…
-
Map_sync with pandas operation function does not finish.
I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by u…