-
**What happened**:
Received the following stack trace when trying to do a cumulative aggregation on a `SeriesGroupBy` (e.g., `cumsum`, `cumprod`) in Dask. The same code runs in pandas without error. …
-
```
import dask.dataframe as dd
from datetime import date
ddf = dd.from_pandas(pd.DataFrame({'a':[date.today(), date(2022,6,13)]}), npartitions=1)
ddf.to_parquet("/tmp/p")
```
This works OK on…
-
The configuration directly within Python is explained in the documentation here :
[Configuration - Directly within Python](https://docs.dask.org/en/latest/configuration.html#directly-within-python)
…
-
Hi and thanks for providing this amazing tool.
I am currently running dada on a set of very deep sequenced samples. Around 3-6M 300bp NextSeq reads per sample remain after filtering.
What i'm cu…
-
My script loads some 1GB CSV files using dask and write the data to parquet. However, sometimes the dask job failes with `aiohttp.client_exceptions.ServerTimeoutError: Timeout on reading data from soc…
-
getSummaryImpl should use RDD[Row] to compute summary, also the if else loop in https://github.com/ddf-project/DDF/blob/master/spark/src/main/java/io/spark/ddf/analytics/BasicStatisticsComputer.java#L…
-
Observation from building test data
Original comment:
May be better to make sites Organizations and allow Organizations to have "children" / "subsidiaries". That way Sites can have addresses and …
-
```python
import numpy as np
import pandas as pd
import dask.dataframe as dd
from dask.dataframe.utils import assert_eq
def test_binop():
df = pd.DataFrame(np.random.uniform(size=(5, 11)…
-
I tried running NVTabular code related to [this](https://nvidia-merlin.github.io/Transformers4Rec/stable/examples/getting-started-session-based/01-ETL-with-NVTabular.html) and [this](https://nvidia-me…
-
Going to ask Ben to get onto this as soon as he joined the repo, since he's busy updating Dockerfiles for the new DDF/kMS build system just now, and could use a good practical entry into the stimela/c…