-
**What happened**:
When setting `sql.identifier.case_sensitive=True` dask-sql still ends up converting identifiers to lowercase during the planning stage.
**What you expected to happen**:
Ca…
-
I just saw the great talk on Dask DataFrames 2.0 at PyData Berlin! I was a bit surprised that duckdb timed out for some of the queries. According to https://duckdb.org/docs/guides/performance/how_to_t…
-
You can type `import this` in Python and a "Zen of Python" is returned with some of the core values of Python, some examples:
* Flat is better than nested
* Readability counts
* In the face of am…
-
@espg have run into memory management issues, and this is my attempt to summarize the problem we need to resolve for a proper solution of the issue.
__This is my attempted problem description__
…
-
At https://github.com/dask/dask/blob/c69d10f97cadd06f147817f0084bddeadb8195e6/dask/dataframe/utils.py#L262, dask doesn't check the dtype of the categories, and just assumes it's an object (overwriting…
-
In https://github.com/dask/dask/pull/9175 we added automatic retries when using remote filesystems with `dd.read_parquet` / `dd.to_parquet`. It'd be nice for other I/O functions (e.g. `dd.read_csv` / …
-
When passing an auxiliary dask DataFrame to `map_partitions`, its chunks are aligned to the main DataFrame and the function receives one chunk of each per task. If you give the same input as a kwarg, …
-
`sklearn.model_selection.cross_validate` fits and scores several models over some CV splits of data.
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html
…
-
Maybe somewhat of a random thought. The add-api looks very good to me, and will help enormously in setting up models. One downside is that it calls `pd.concat()` once per added element. I'm pretty sur…
Huite updated
6 months ago
-
Map_sync with pandas operation function does not finish.
I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by u…