benedekrozemberczki / pytorch_geometric_temporal

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
MIT License
2.64k stars 371 forks source link

How to use METRLADataset? #89

Closed shuowang-ai closed 3 years ago

shuowang-ai commented 3 years ago

Hi, thanks for this helpful project! But I have few questions. How to use METRLADataset (which has 12 in steps and 12 out steps, with feature number higher than 1). The examples you provided are all about ChickenpoxDataset which has 4 in steps and 1 out step. And it seems that you use feature dimension to represent the historical steps. It's a bit confusion. Could you please provide examples using METRLADataset? Thanks again!

MoRoBe-Work commented 3 years ago

Hi, I believe the two features are the normalized velocity and the time of day encoded as a value between 0 and 1. So the shape of the data should be (num_nodes x velocity + time of day (2) x num_timesteps(12)). I haven't used the dataset for a while, does this fit the dimensions you're getting? Be careful with this information though, I am just a user of the code as well and was in no way involved in its creation! Best Regards MoRoBe

benedekrozemberczki commented 3 years ago

Dear @shawnwang-tech,

@paulmorio did the integration. Paul! Do you want to comment on this?

Bests,

Benedek

shuowang-ai commented 3 years ago

@MoRoBe-Work @benedekrozemberczki @paulmorio Thanks for the reply! From https://github.com/benedekrozemberczki/pytorch_geometric_temporal/issues/72, I know that pyg-temporal provide the layers rather than the whole model. For example, we need to write the encoder-decoder to use DCRNN Layer? But why the ChickenpoxDataset case (https://pytorch-geometric-temporal.readthedocs.io/en/latest/notes/introduction.html#applications) doesnot use the encoder-decoder arch to iterate the lags, instead it regards lag as features? @MoRoBe-Work Yes, the shape is (num_nodes x velocity + time of day (2) x num_timesteps(12)), but I donot know how to feed in the pyg-temporal model. So I hope the examples could be provided. Best Regards Shawn

paulmorio commented 3 years ago

Hi @shawnwang-tech,

Thanks for the interest in the library. Every observation in the dataset produced by METRLADatasetLoader is a data object which has an x attribute of the shape (207, 2, 12) corresponding to (num_nodes, node_features, num_time_steps); the y attribute has shape (207, 12) corresponding to (node_target_of_interest, num_time_steps).

You can see this by doing

loader = METRLADatasetLoader()
dataset = loader.get_dataset()
single_observation = next(iter(dataset))
print(single_observation)

So that in the default setting the model you create is concerned with using the t-12 observations of the 207 nodes to produce the target values of the 207 nodes in the next 12 time steps. You can change this as you can change argument values of the get_dataset() method of the METRLADatasetLoader away from the defaults to create different tasks.

def get_dataset(self, num_timesteps_in: int=12, num_timesteps_out: int=12) -> StaticGraphTemporalSignal:

As for your second question, there are several options you may pursue to utilise the 12 input graph signals to produce 12 output graph signals. For example, using a recurrent model you can use the recurrent layers output for t+1 as input to predict t+2, and the same for t+3 and so on. Which you would implement in the forward() method of the torch.nn.Module you would implement. We will look into making more examples for these kind of scenarios in the future docs.

All the best, Paul

shuowang-ai commented 3 years ago

Hi @paulmorio Thank you so much! I think I get what you mean. And I will try to implement those ST-GNNs on the traffic data. If that works, I would like to open a pull request, and ask you for help to review the code. More over, I used to contribute example code to PyG. And I really love the PyG and PyG-temporal ecology! Best, Shawn