SalesforceAIResearch / uni2ts

Unified Training of Universal Time Series Forecasting Transformers
Apache License 2.0
884 stars 97 forks source link

Bug when preparing data for finetuning #122

Open marcopeix opened 2 months ago

marcopeix commented 2 months ago

I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.

Here's the code:

def data_generator() -> Generator[dict[str, Any]]:
    yield {
        "target": df['Weekly_Sales'].to_numpy(),
        "start": df.index[0],
        "freq": pd.infer_freq(df.index),
        "item_id": "1",
    }

features = Features(
    dict(
        target=Sequence(Value("float32")),
        start=Value("date32")),
        freq=Value("string"),
        item_id=Value("string"),
    )

hf_dataset = Dataset.from_generator(data_generator, features=features)

hf_dataset.save_to_disk(Path("sales_dataset/"))

df = hf_dataset.to_pandas()

df.to_csv('sales_dataset/sales_data.csv', index=False)

Then, when I run python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long , I get the error:

IndexError: index 0 is out of bounds for axis 0 with size 0. Not sure why that happens, as my df is not empty, and the .csv is not empty either.

Here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv

I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.

What am I missing?

gorold commented 1 month ago

didn't look too deeply into this, but I'm guessing it's due to the format (column names) of your data frame?

https://github.com/SalesforceAIResearch/uni2ts/blob/2ba614de8878d350c62835c942b450d2f4d5a711/src/uni2ts/data/builder/simple.py#L58