dask-dataframes Search Results

1000+ results
for dask-dataframes

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

rapidsai/cudf #11285

[FEA] Potential missing performance in `partitioning.partiti…

A common pattern in dask is to shuffle distributed data around by some hash-based index. For example, this comes up in merging dataframes. Since the determination of index buckets is typically carried…

wence- updated 1 year ago
2
joblib/joblib #343

Faster hashing

joblib currently defaults to md5-hashing its input. For the tasks at hand, a non-cryptographic hash can be significantly faster (see comparison table at http://cyan4973.github.io/xxHash/). scikit-lea…

anntzer updated 3 years ago
11
dask/dask #3431

Merge on CategoricalIndex fails with KeyError

I'm trying to set up the merge of two dataframes with a `CategoricalIndex` but am getting a confusing traceback (see below). I could track down the issue to the [align_partitions](https://github.com/d…

fjetter updated 3 years ago
2
Deltares/imod-python #654

zonal_aggregate_raster and zonal_aggregate_polygon may give …

In GitLab by @Huite on Nov 18, 2023, 17:38 I got this report from Jacco. If the resolution is small enough in these methods, it will result in duplicate entries in the output: ```python imod.prepar…

Manangka updated 10 months ago
2
pydata/xarray #9220

Groupby-map is slow with out of order indices

### What is your issue? _I think that this is a longstanding problem. Sorry if I missed an existing github issue._ I was looking at an Dask-array-backed Xarray workload with @phofl and we were bo…

mrocklin updated 1 week ago
15
dask/dask #1744

Function to determine optimal dtypes for a dask.dataframe

In many cases we read tabular data from some source modify it, and write it out to another data destination. In this transfer we have an opportunity to tighten the data representation a bit, for exam…

mrocklin updated 3 years ago
8
pangeo-data/education-material #14

Proposed grouping of titles

The list of links will be more digestible if you classify a big chunk of links to smaller ones, so a reader can navigate. Here is a rough grouping proposal: ### 1. Python for earth scientists ##…

epogrebnyak updated 4 years ago
2
dask/dask #8769

DataFrame merge: clarify that merging on column puts all mat…

I naively tried to do `dd.merge(a, b, on="column_with_ten_values")`, where `a` and `b` were both large DataFrames with thousands of partitions each. Eventually the compute failed with: ```python-t…

gjoseph92 updated 3 months ago
2
dask/dask #7088

Add Stack function to Dask CUDF Dataframe

I would like to be able to call stack() on the Dask cudf dataframe. Currently stack is a function in the cudf dataframe, not in Dask cudf dataframe. I would like the stacking to take place on the wor…

chardog updated 3 years ago
2
dask/dask #5159

MemoryError while to_parquet or .repartition

I'm running into a MemoryError when I try to save in a parquet format, or repartition(by size). I didn't have this issue before, but after merging two dask dataframes, it's giving me this error. The…

matthewgson updated 3 years ago
4

上一页 1...21 22 23 24 25 26 27...100 下一页

1000+ results for dask-dataframes

1000+ results
for dask-dataframes