SalesforceAIResearch / uni2ts

[ICML2024] Unified Training of Universal Time Series Forecasting Transformers
Apache License 2.0
772 stars 74 forks source link

Bug when trying to prepare custom dataset for finetuning #102

Open marcopeix opened 1 month ago

marcopeix commented 1 month ago

I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.

Here's the code:

def data_generator() -> Generator[dict[str, Any]]:
    yield {
        "target": df['Weekly_Sales'].to_numpy(),
        "start": df.index[0],
        "freq": pd.infer_freq(df.index),
        "item_id": "1",
    }

features = Features(
    dict(
        target=Sequence(Value("float32")),
        start=Value("date32")),
        freq=Value("string"),
        item_id=Value("string"),
    )

hf_dataset = Dataset.from_generator(data_generator, features=features)

hf_dataset.save_to_disk(Path("sales_dataset/"))

df = hf_dataset.to_pandas()

df.to_csv('sales_dataset/sales_data.csv', index=False)

Then, when I run python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long, I get the error:

IndexError: index 0 is out of bounds for axis 0 with size 0. Not sure why that happens, as my df is not empty, and the .csv is not empty either.

What am I missing?

liu-jc commented 1 month ago

Hi @marcopeix,

Could you please provide a sample .csv you used? We can look more into it.

marcopeix commented 1 month ago

@liu-jc sure here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv

I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.

marcopeix commented 3 weeks ago

@liu-jc, did you have time to take a look at this? It's blocking me in my progress! Thanks!