-
I'm attempting to use to p2p shuffle implementation (using the branch proposed for merge in #7326) to shuffle an ~1TB dataset.
The data exists on disk as ~300 parquet files (that each expand to aroun…
-
**What happened**:
I use the `distributed.utils_test` fixture `cleanup` in my pytest tests. I have some code that uses dask `threading` scheduler. However, the `cleanup` fixture has a `check_th…
-
**What happened**:
Due to a mistake in our code, we were persisting a dask dataframe in one scheduler, but then ran the compute while specifying threads scheduler. What was weird was that the compu…
-
I'm trying to process parquet files stored in AWS S3.
The files are read simply with
```python
with Client(n_workers=6) as client:
df = dd.read_parquet('s3://lightnings_*.gzip.parquet')
…
-
When I load a csv first into dask, and then into dask dataframe using .from_dask_dataframe, ._meta_nonempty does not exist, causing downstream problems in analysis (e.g. `with spatial_shuffle`). My h…
-
Is it possible to delete the specified DDF when a drop table command is executed on it?
Shiti updated
9 years ago
-
Following up on https://stackoverflow.com/questions/48592049/dask-dataframe-groupby-apply-efficiency/48592529 with an example.
Read data w/ a sorted index column and perform a groupby; shouldn't re…
bnaul updated
3 years ago
-
Using `Distance` 2.0.0 and `mrds` 3.0.0.
The following code uses these duiker data: [DaytimeDistances.txt](https://github.com/user-attachments/files/17680350/DaytimeDistances.txt)
```
DuikerCam…
-
When creating a GeoDataFrame from a dask dataframe, we could pass through the `crs` keyword to the underlying geopandas.GeoDataFrame constructor:
https://github.com/geopandas/dask-geopandas/blob/5b…
-
```
from dask_expr import from_pandas
df = pd.DataFrame({"a": [1, 2, 3], "bb": 1}, index=["a", "a", "b"])
ddf = from_pandas(df)
ddf.a["b"].compute()
```
This raises
```
Traceback (most r…