fastai / fastcore

Python supercharged for the fastai library
http://fastcore.fast.ai
Apache License 2.0
964 stars 276 forks source link

DataLoaders.from_df hangs with empty dataframe #281

Closed jeremybmerrill closed 3 years ago

jeremybmerrill commented 3 years ago

Hi! Longtime listener, first caller. Which is to say, I'm a big fan of fastai -- so, thank you all very much. :)

On the latest pip-installed version of fastai (fastai==2.2.5, fastcore==1.3.19), when I try to load an empty pandas DataFrame into a DataLoader using from_df, it gets stuck in an infinite loop.

Works as expected:

!pip install -U -q fastai
from fastai.text.all import *
import pandas as pd

df = pd.DataFrame({'texts': ["lorem ipsum", "someone told me not to cry"], 'label': [True, False]})
dls_inference = TextDataLoaders.from_df(df, text_col='texts')

Hangs indefinitely:

!pip install -U -q fastai
from fastai.text.all import *
import pandas as pd

df = pd.DataFrame({'texts': [], 'label': []})
dls_inference = TextDataLoaders.from_df(df, text_col='texts')

The expected behavior would be that I get a DataLoader that immediately stops providing batches (or an error message).

The loop it gets stuck in appears to be this one (based on what I get when I Ctrl-C to interrupt the loop).

(I'm using fastai as part of a production-ish workflow, which is why I'm feeding an empty dataframe to a DataLoader. I've adjusted my code to check the size of the dataframe before we load the DataLoader to work around the bug, but I figured fixing this would save somebody else some confusion.)

jph00 commented 3 years ago

I believe this is fixed now.