dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 718 forks source link

Can not append distributed dataframes #2308

Open Norhk opened 5 years ago

Norhk commented 5 years ago

As the title says. Then appening distributed dataframes I get an error:

(The dataframe is distributed on a number of workers.)

ValueError: All inputs have known divisions which cannot be concatenated in order. Specify interleave_partitions=True to ignore order

Passing the argument yields unknown argument error.

dask version '0.19.4'

mrocklin commented 5 years ago

It looks like append hard-codes interleave_partitions while concat allows it.

@Norhk short term I recommend using concat

@TomAugspurger are there reasons not to allow interleaving when using append?

@Norhk if not would you be interested in submitting a small PR to fix? This should be fairly easy.

Norhk commented 5 years ago

Yes I will make a PR, give me 2 weeks (conference...). I will also test this further. The project is amazing!