dask / dask

Parallel computing with task scheduling
https://dask.org
BSD 3-Clause "New" or "Revised" License
12.52k stars 1.71k forks source link

`TypeError` when using `from_map` on `enumerate` object #9064

Closed charlesbluca closed 2 years ago

charlesbluca commented 2 years ago

What happened: When attempting to use from_map with an enumerate object as the input iterable, I run into an uncaught TypeError when trying to compute the length of the iterable:

TypeError                                 Traceback (most recent call last)
Input In [1], in <cell line: 10>()
      6     x = t[1]
      8     return pd.Series([x] * size)
---> 10 dd.from_map(func, enumerate(["A", "B"]))

File ~/dask/dask/dataframe/io/io.py:953, in from_map(func, args, meta, divisions, label, token, enforce_metadata, *iterables, **kwargs)
    949     raise ValueError(
    950         f"All elements of `iterables` must be Iterable, got {type(iterable)}"
    951     )
    952 try:
--> 953     lengths.add(len(iterable))
    954 except AttributeError:
    955     iterables[i] = list(iterable)

TypeError: object of type 'enumerate' has no len()

What you expected to happen: I would expect this error to be caught, as it seems like the except case would've resolved this by creating a list from the enumerate object.

Minimal Complete Verifiable Example: This fails:

import pandas as pd
import dask.dataframe as dd

def func(t):
    size = t[0] + 1
    x = t[1]

    return pd.Series([x] * size)

dd.from_map(func, enumerate(["A", "B"]))

Creating a list from the enumerate object before calling from_map works:

import pandas as pd
import dask.dataframe as dd

def func(t):
    size = t[0] + 1
    x = t[1]

    return pd.Series([x] * size)

dd.from_map(func, list(enumerate(["A", "B"])))

Environment:

jsignell commented 2 years ago

Looks like a little bug. Do you or @rjzamora have time to open a PR to fix?