Closed geoHeil closed 3 years ago
Is the type of the graph (DynamicGraphTemporalSignal
) even suitable?
The feature set and node labels (target) are also dynamic. https://github.com/benedekrozemberczki/pytorch_geometric_temporal/blob/a13ea7876525ed9ba7c48ec69408024346eaded3/torch_geometric_temporal/signal/dynamic_graph_temporal_signal.py#L14
This rather sounds to me as if only some classification/regression would work out of the box? I hope this is wrong and my formulation of future edges as the target given a specific timestamp would work as well.
You could define a custom loss on edge pairs that you look up from the edge features or have some running temp variable for lagged states. I cannot see what is the exact problem that you are facing with this, you can keep two snapshots in memory and iteratively replace them and predict for the second one e.g. queue.
Some publications refer here to a regularized L2 distance.
My problem is: being very new to deep learning and in particular geometric deep learning everything is still a bit confusing - especially starting out with something which is not covered in the starter examples with so many new APIs and layers of APIs.
For the loss function something simple along these lines should be fine to start out with:
import pandas as pd
dummy_loss = pd.DataFrame({'src':[1,2], 'dst':[3,4], 'src_pred':[2,2], 'dst_pred':[3,4],})
# undirected - inverse matches also count
dummy_loss.loc[((dummy_loss.dst == dummy_loss.src_pred) & (dummy_loss.src == dummy_loss.dst_pred)) | ((dummy_loss.src == dummy_loss.src_pred) & (dummy_loss.dst == dummy_loss.dst_pred)), 'is_correct'] = 1
dummy_loss.is_correct = dummy_loss.is_correct.fillna(0)
display(dummy_loss)
dummy_loss.is_correct.value_counts(normalize=True)
You could define a custom loss on edge pairs that you look up from the edge features or have some running temp variable for lagged states
Sure, but for me, it is unclear if this should go somehow into the data loader or somewhat implicitly into the training loop.
Your data loaders for regression always have a tight relationship between the features and targets:
stacked_target = np.array(self._dataset["y"])
standardized_target = (stacked_target - np.mean(stacked_target, axis=0)) /(np.std(stacked_target, axis=0)+10**-10)
self.features = [standardized_target[i:i+self.lags,:].T for i in range(self._dataset["time_periods"]-self.lags)]
self.targets = [standardized_target[i+self.lags,:].T for i in range(self._dataset["time_periods"]-self.lags)]
How could it work that edges are predicted instead of given edges (with weights) and nodes (with temporal features)? I.e. currently there seems to be a disconnect between features (temporal node or edge features) and target (the topological edges in a future time step).
Most other approaches I have seen so far for edge prediction seem to utilize negative sampling & a temporal random walk/embedding and then binary classification. I am unsure if this would be needed in the case of PyTorch geometric temporal as well - and if yes, where to fit it into the framework. Should it go into the data loader?
Added an integer based index to all of the iterators and created a new release (0.33) which allows direct indexing of snapshots.
How can I perform spatio-temporal link prediction on an unweighted graph with features on the edges?
My graph is a
DynamicGraphTemporalSignal
, but I do not want to regress on some signal, rather perform link prediction as also outlined in https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/nn/recurrent/gc_lstm.html?highlight=linkBut from looking at the example code of: https://github.com/benedekrozemberczki/pytorch_geometric_temporal/blob/4fb610326811906eb4f95152a80f5df3f62fb459/test/recurrent_test.py#L305 it is still unclear to me how to create a suitable data loader.
So far I have looked into all the available ones from https://github.com/benedekrozemberczki/pytorch_geometric_temporal/tree/master/torch_geometric_temporal/dataset and think that the Twitter tennis data loader https://github.com/benedekrozemberczki/pytorch_geometric_temporal/blob/master/torch_geometric_temporal/dataset/twitter_tennis.py is the most suitable one.
However, even this one seems to be stuck in formulating a regression problem.
Please find the code to produce the dummy data below. It also contains the transformations I have come up with so far from looking at existing data loaders.
But for me, it remains unclear how to structure the data loader for a link prediction problem and not for a regression.
edit
I guess somehow the target must be the lagged edge-index of a future time index? Perhaps even of any future time index? depending on the forecast horizon.
Perhaps something along these lines would work? Here in this example a prediction horizon with an unlimited boundary is assumend.