-
### Summary
We should make space for high level query optimization. There are a couple of ways to do this. This issue includes motivation, a description of two approaches, and some thoughts on tr…
-
```python
import dask
import dask.dataframe as dd
def process_df(df):
return df
def make_df():
return pd.DataFrame([[1, 3], [2, 3], [3, 4]] , columns=['A', 'B'])
a = dd.from_delay…
-
Hi!
GRNBoost2 produces an error at the very last step. The same happens when I use GENIE3. It seems to be a problem with Dask, however, I could not figure out what is going on.
**The code:**
``…
-
I suspect that we will often want to use tsqr with unknown chunk sizes, which occur whenever someone converts a dask dataframe into a dask array (dataframes don't maintain chunk sizes). Currently we…
-
It would be nice to be able to supply `kartothek.io.dask.delayed.merge_datasets_as_delayed` with a list of `dataset_uuids` to merge an arbitrary number of datasets.
This could be implemented by
…
-
Hi,
I'm writing a notebook example to highlight some key differences between pandas and dask. Are you interested in such a PR?
If so i have currently the following topics - (are there any addition…
-
Dask supports various serialization methods for its DataFrames (see [here](https://distributed.dask.org/en/latest/serialization.html)), and for the `EnsembleFrame` hierarchy we should validate that we…
-
There are a number of optimised libraries for many packages, with optimsation at different levels...
## Intel Optimisations
* [Intel Extensions for Scikit-learn](https://intel.github.io/scikit-l…
-
Dask dataframes are missing the "reindex" function. Would be great to support it, as it's a useful primitive for time series analysis. I think limiting the support to just sorted Int64Index or DateT…
-
**Need Dask Dataframe support for Create_REPORT - Need to materialize computes**
When the input dataframe is constructed from Dask.DataFrame , create_report(df) throws error
"Missing Cells": float…