-
Thanks for this awesome project!
I have a scikit-learn pipeline combining some custom transformers for feature-engineering with a classifier at the end (xgboost). Does Dask-ML accept user-defined p…
-
Figure out a way to distribute all layers of SQL execution #10 on Apache Beam.
-
We can't currently go from DataFrames to arrays, #445 adds ``to_dask_array`` but this is only a bandaid for now.
I think in an ideal world we have an Array Collection that captures something like `…
phofl updated
9 months ago
-
**What happened**:
A computation from a delayed call returning two outputs is done twice when one output is array and one output is a dataframe. Interestingly, if both outputs are arrays, or both …
-
-
```
from dask_sql import Context
import pandas as pd
import dask.dataframe as dd
c = Context()
pd.DataFrame({'id': [0, 1, 2]}).to_parquet('/data/test/part.0.parquet')
# this works
c.sql("…
-
Things like `classes=da.unique(y)` may be inefficient. This will have to be called on each block of data, which is expensive especially if the `y` isn't persisted.
Things like `sample_weight` are t…
-
FLT processing pegs a single CPU core. Could multiprocessing/multithreading be introduced?
-
### Modin version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest released version of Modin.
- [X] I have confirmed t…
-
thank you for the wonderful library!
**Is your feature request related to a problem? Please describe.**
I'm wondering if RecBole's [data flow](https://recbole.io/docs/user_guide/data/data_flow.htm…