Open marcopeix opened 2 months ago
I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.
Here's the code:
def data_generator() -> Generator[dict[str, Any]]: yield { "target": df['Weekly_Sales'].to_numpy(), "start": df.index[0], "freq": pd.infer_freq(df.index), "item_id": "1", } features = Features( dict( target=Sequence(Value("float32")), start=Value("date32")), freq=Value("string"), item_id=Value("string"), ) hf_dataset = Dataset.from_generator(data_generator, features=features) hf_dataset.save_to_disk(Path("sales_dataset/")) df = hf_dataset.to_pandas() df.to_csv('sales_dataset/sales_data.csv', index=False)
Then, when I run python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long , I get the error:
python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long
IndexError: index 0 is out of bounds for axis 0 with size 0. Not sure why that happens, as my df is not empty, and the .csv is not empty either.
IndexError: index 0 is out of bounds for axis 0 with size 0
Here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv
I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.
What am I missing?
didn't look too deeply into this, but I'm guessing it's due to the format (column names) of your data frame?
https://github.com/SalesforceAIResearch/uni2ts/blob/2ba614de8878d350c62835c942b450d2f4d5a711/src/uni2ts/data/builder/simple.py#L58
I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.
Here's the code:
Then, when I run
python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long
, I get the error:IndexError: index 0 is out of bounds for axis 0 with size 0
. Not sure why that happens, as my df is not empty, and the .csv is not empty either.Here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv
I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.
What am I missing?