-
#### Code Sample
```python
import pandas as pd
import numpy as np
left = pd.DataFrame(
columns=['A'],
index=pd.Index([], name='id', dtype=np.int64)
)
right = pd.DataFrame(
[[…
-
I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes.
Overall things look good compared to the current implementation (using pickle). The biggest difference…
-
When converting local dataframes to a Ray Dataset and Dask DataFrame or when there is a group-map operation, Ray requires users to be explicit about the number of partitions and reducers. However, mos…
-
Pretty random I get the following situation:
The dask `cluster = LocalCluster(n_workers=28, host='192.168.56.11')` works good until some lonely task is hanging.
Checking the logs, I see a trac…
-
While we can convert a `pandas.DataFrame` to a single (arbitrarily large) `arrow::RecordBatch`, it is not easy to create multiple small record batches – we could do so in a streaming fashion and immed…
-
Some users are looking for tools to help them assemble ERDDAP urls for use in their own workflows, while others would prefer to work at a higher, more opinionated level. I believe we can more cleanly …
-
Submitting Author: Jonny Tran (@JonnyTran)
All current maintainers: @JonnyTran
Package Name: openomics
One-Line Description of Package: Library for integration of multi-omics, annotation, and int…
-
Thanks for all of your amazing work on pyOpenSci. It's great to see the progress this project has made.
I'm opening this issue to discuss how we (in the Pangeo project) can leverage and collaborate…
-
**Why would this plugin be helpful to the Flyte community**
Users could write very short running distributed array jobs using DASK. This makes it possible to have very small runtime jobs multi-plexed…
-
_map_overlap()_ is unable to handle raw pandas DataFrame, unlike _map_partition()_.
```
import dask.dataframe
import pandas as pd
df = pd.util.testing.makeMixedDataFrame()
# Works fine
da…