Question on dimensions of Datasets

QData / spacetimeformer

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

https://arxiv.org/abs/2109.12218

MIT License

808 stars 191 forks source link

Question on dimensions of Datasets #11

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello -

I am looking to try new datasets with your model, but just having a little hard time understanding the x_dim and y_dim hard coded into the train.py.

What exactly do each of these mean?

For example, for the solar_energy data set, I see that the y_dim is 137 because it has 137 features, but where does the 6 come from?

abcde-1447 commented 3 years ago

@jjwow2 My understanding is time embedding has 6 dimensions, see this class: https://github.com/QData/spacetimeformer/blob/90721a11906874027b3ac658a27fbbfde3150e9d/spacetimeformer/data/csv_dataset.py#L16-L16

jakegrigsby commented 3 years ago

Yeah the 6 is pretty arbitrary, it comes from dividing calendar dates into (Year, Month, Day, Hour, Minute, Second). Some datasets use all 6 values even when they don't need them (when data is always at daily intervals, for example). I think the traffic datasets are the only ones that are not in this format and that's just to get a fair comparison to prior work. IIRC Metr-LA and Pems-Bay just have hour, and day of the week variables.

ghost commented 3 years ago

Thank you! I was able to implement my own dataset successfully with this information but might be removing the seconds embedding in the future.

Overall the results are pretty good (.09 RMSE), but sometimes the pytorch lightning test method returns some non-sensical results, which I need to figure out.