benedekrozemberczki / pytorch_geometric_temporal

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
MIT License
2.56k stars 364 forks source link

temporal graph classification #93

Closed krzysztoffiok closed 2 years ago

krzysztoffiok commented 2 years ago

I'm wondering if there is an example of classification carried out for data represented as temporal graphs?

To be precise: in my use case I have for instance a sequence of 100 graphs representing together a single entity/object. Next, I have 200 entities. Is there an easy way to input those 200 data instances (with ground truth classification labels) to one of the numerous deep learning architectures implemented in this repo to obtain instance representations useful in the classification task? Or simply obtain predicted labels? Is there an example for this or did I miss it ? (apologies if I did).

Thank you for your help in advance.

alexriedel1 commented 2 years ago

Hey! Yes you can also use the implemented methods for classification, see the example here: https://github.com/benedekrozemberczki/pytorch_geometric_temporal/blob/b9bcb7377d63dcbf5f21448d09be5a8df0f9619e/examples/recurrent/agcrn_example.py#L16-L24

If you have 20 classes in your dataset you define let's say out_channels = 20 and remove the linear layer, or keep it and adjust the layer dimensions self.linear = torch.nn.Linear(20, 20).

krzysztoffiok commented 2 years ago

Sounds great @alexriedel1 , OK, thank you, I'll give it a try.

alexriedel1 commented 2 years ago

But keep in mind that the recurrent network might not be the best fit for a classification task, it's just an example on how to use the library! Maybe consider using ST-GCN for a classification task as described in this paper https://arxiv.org/abs/1801.07455

krzysztoffiok commented 2 years ago

Hmm, thank you again. I was hoping for an architecture that could create meaningful representation of a dynamic graph i.e., a vector of features that would neatly represent the whole sequence of static graphs (a dynamic graph). I believed that an analogy to the NLP world exists like that a sentence/sequence of tokens can be represented/converted by a trained recurrent network as a single 'text instance embedding'. Is that not the case with the sequence of graphs?

alexriedel1 commented 2 years ago

Yes thats totally possible, I just thought attention might be the better choice.. But you can of course also go with a recurrent model.

krzysztoffiok commented 2 years ago

If I get this working, I'm sure I'll try both! Thank you again.

benedekrozemberczki commented 2 years ago

Dear @krzysztoffiok,

Do you want to share the dataset?

krzysztoffiok commented 2 years ago

Dear @benedekrozemberczki, thank you for taking an interest in my issue. The data I'm working with is a fragment of fMRI data from the Human Connectome Project. I'm still in the process of deciding exact procedures for extraction of node and edge features from the data. I already did some experiments (with static graphs, GNNs from the https://github.com/rusty1s/pytorch_geometric package and with use of node embeddings from your great https://github.com/benedekrozemberczki/karateclub package). Presumably next week I should be able to share a small version of the data set formed as temporal graphs that should be good for testing and provide reasonable size & computational cost. I will write in this discussion when I'm ready.

benedekrozemberczki commented 2 years ago

How is it going?

krzysztoffiok commented 2 years ago

Sorry for responding late, this is of course not my only project and I had to do other things first.

As for now I'm stuck in a situation where I wanted to implement a following baseline approach: 1) obtain a sequence of static graphs for each patient/recording from the fMRI data. Rationale: the fMRI time series recordings differ in length between recording sessions, but anyway each recording can be divided into N parts to obtain a sequence of N static graphs instead of a single static graph per patient per recording session, 2) after step 1), try out the whole graph embeddings concept (for instance FeatherGraph from karateclub) and embed each of the obtained statics graphs, 3) use a multivariate time series classifier on the graph-level features. For this I have tried the https://www.sktime.org/en/stable/examples/03_classification_multivariate.html package.

I have checked many data-related issues and I now believe the following (however of course I might be incorrect): 1) The resulting static graphs in the pytorch-geometric Data format and networkx format that I'm using to try out FeatherGraph have all the same number of nodes and edges. The only way the graphs differ is the node features (derived from the brain-region== node time series) and edge features (a single feature for each edge equal to correlation value obtained from the correlation matrix computed with https://nilearn.github.io/ package from the time series of all nodes/brain regions), 2) The classifier gets 0 performance, 3) When I try out a method to delete graph edges that have an edge feature lower than a threshold value (arbitrary choice) and experimentally guess a more suitable number of N, I manage to obtain low but nonzero performance, 4) I have tried also "GL2Vec" whole graph embedding and provide the "feature" key as in documentation. In any case the situation doesn't improve.

Conclusion: points 3 and 4 make me wonder if those whole graph embedding techniques manage to grasp differences in the graphs if they are solely based on feature values (both node and edge) and not the adjacency matrix i.e. number of nodes and connections? The other option I see it is that my coding skills are lacking ...

Please find attached a sample of the data and a jupyter notebook prepared to inspect the data. dataset_part_and_ipynb.zip

Again, thank you for your time.

alexriedel1 commented 2 years ago

Hey @krzysztoffiok, after taking a quick look at your data, it seems that 1) interpreting the edge_index, each node is connected with every other node at every time step, is that correct and supposed to be? 2) all your classification labels are 0 which makes it difficult to come up with a baseline solution for classification on this series

krzysztoffiok commented 2 years ago

Hi @alexriedel1 ,

regarding 1) yes, this is exactly the case, as the assumption is that the brain regions are connected with each other and only the "strength" of the connection represented by the edge_attribute varies, and regarding 2) yes, this is a part of the data set that represents the same task here labeled as '0'. The other parts of the data set look exactly the same regarding the format/shape and differ by the label. There are overall 7 tasks so 7 labels (0-6) in the data set. I understand with this sample of the data it is impossible to propose any final solution. The data sample was attached to give an idea of what the data looks like and if it is not corrupted in any manner. The experiments I was describing earlier on were conducted on the whole data set i.e. with 7 tasks/classification labels.

If you think it will help, I can upload here another part of the data set which represents a different task, so it will be possible to define a proper classification challenge based on those two data set parts?

alexriedel1 commented 2 years ago

Hi @krzysztoffiok, ok i got it! Yes it would be awesome if you can upload a dataset with one other task, so it would be worth a try to train a binary classifier at first!

Also for my understanding: in the dataset you provided, 15 consecutive graphs belong to one patient so they should be treated as a sequence right? Will each of those sequences in the shape of #Sequence, Nodes, Features -> (15, 17, 12) have only one label?

krzysztoffiok commented 2 years ago

@alexriedel1,

Again thank you for your interest.

Please give me a moment to prepare and upload the 2nd part of the data set.

Yes this is exactly the case, you understand correctly.

krzysztoffiok commented 2 years ago

@alexriedel1

This should be it. The same number of patients (51), the same number of graphs per patient (15) and the same format. The class label here is '1'.

hcp_17_51relational-2_ts.zip

alexriedel1 commented 2 years ago

Hey, I worked some stuff out on the data set. I gave some remarks in the code about the things I tried. Basically I trained a model without considering the edge weights but only the node features.

It's giving a high accuracy on both a train and validation split. Anyway the dataset provided is really small with the 102 graph sequences and it's only a binary classifier so I'm not sure if the results can be generalized to the full task.

Can you provide more insights where the node features but especially the edge weights data come from?

fmr_data_train.zip

krzysztoffiok commented 2 years ago

@alexriedel1 I can't find words to thank you. I have briefly reviewed your example and I believe that after looking into it a while longer it will let me understand the framework and how I should proceed. This is very helpful especially regarding how to shape the data in order to feed it to the model.

The very high accuracy you have obtained is very likely, as these data represent seriously varying tasks. Of course you're right that there is too little data to celebrate, but this is all a sample of the whole HCP data set I'm working with.

Explanation behind the node features: in the fMRI recording, there are too many voxels and corresponding time series that represent the brain activity to handle at once. So, in order to decrease dimensionality, the voxels are aggregated according to a predefined brain atlas (there are several options here). In the data sample here the brain was divided into only 17 regions which is extremely little. A more common number would be 200, or even over 300. In the brain graph representation each region is treated as a separate node. Now, the activity of the voxels aggregated as a brain region (node) is a time series which is an aggregation of underlying voxel time series's. So, it can be said that the node features are a time series representing the underlying brain region activity. In the data we are discussing now I have divided the whole length of the recording into 15 splits and that is why each patient is represented by a sequence of 15 static graphs. However, because the task-related recording length differed (one task lasted longer than the other), this resulted in different number of time steps and node features. The choice of 15 splits was arbitrary, I will definitely test how this number influences overall performance.

Explanation behind the edge weights: correlation between brain regions (their time series's) extracted by means of http://nilearn.github.io/modules/generated/nilearn.connectome.ConnectivityMeasure.html#nilearn.connectome.ConnectivityMeasure

I hope I managed to explain these features a little bit?

If you are interested further in the related research I can send you some links to papers that describe these things in detail and show example approaches based on both static and dynamic graphs. Or we can chat on skype if you like, live communication often helps ;)

Thank you again!!!

alexriedel1 commented 2 years ago

You're welcome @krzysztoffiok ! I got your explanation regarding the features. If the edge weights are only computed from the node feauture data, you might want to give the job to the neural network as well. It sometimes might show better results than traditional well established methods.

Do you mind using the two dataset you provided for this repository? I would like to add that sequence classification as an exmaple for others that face similar problems? All the examples in the repo are more about forecasting.

krzysztoffiok commented 2 years ago

@alexriedel1

I'm glad regarding the explanations. It seems an idea to consider.

Yeah it is true the current examples regarding forecasting made me write the issue regarding classification in the first place.

I think there should be no problem with that since HCP is a publicly available data set, only maybe it will be better if I provide you with the data for all 7 task-based recordings there are? Only please let me confirm that with a friend who was the actual person responsible for preprocessing raw HCP data. I will let you know soon.

krzysztoffiok commented 2 years ago

@alexriedel1 can I ask you if there is an easy way to make this example work for multi class problems? I presume it shouldn't be too complicated, but somehow I fail...

Also, I have tried your code on another similar data set with binary classification (fMRI resting state data and gender prediction, this is a frequent binary task, quite challenging normally) and there the training accuracy goes high again, but unfortunately validation accuracy does not. Modification of dropout value didn't help much. Is it that stacking more AAGCN layers might help? Or maybe you are aware of any other tips?

alexriedel1 commented 2 years ago

@krzysztoffiok Hey, i have modified the code to work with multiple classes, you find the parameter num_classes = 2 for this purpose.

Also I have added another model based on STConv (https://pytorch-geometric-temporal.readthedocs.io/en/latest/modules/root.html#torch_geometric_temporal.nn.attention.stgcn.STConv) maybe this one works better for your other classification task.

Also if a model is overfitting, there is a variety of actions to prevent this. Adding regularization or reducing the model size are just two of them. If you want, you can share your gender prediction data again :)

train_2.zip

krzysztoffiok commented 2 years ago

@alexriedel1 Thank you again very much. I'll work with your code and get back to you and hopefully share both data sets.

krzysztoffiok commented 2 years ago

Hi @alexriedel1

Regarding the data it is tricky because in theory anyone who wishes to work with this data should head to http://www.humanconnectomeproject.org/data/ and obtain a personal permission.

So maybe a possible work-around would be to for instance modify the data by adding some random signals/noise so that it retains only the shape/structure? Just thinking out loud.

krzysztoffiok commented 2 years ago

@alexriedel1 Can I ask one more thing, namely how can I pass a batch of DynamicGraphTemporalSignal objects to the model?

Let's assume I'm using this example model: class AGCN_temporal(torch.nn.Module): def init(self, edge_index, max_number_of_node_features, number_of_graph_nodes, number_of_static_graphs, number_of_classes, edge_weight): super(AGCN_temporal, self).init() self.agcn = AAGCN(in_channels=max_number_of_node_features, out_channels=32, edge_index=edge_index, num_nodes=number_of_graph_nodes) self.linear = torch.nn.Linear(number_of_static_graphs, number_of_classes) self.drop_out = torch.nn.Dropout(0.75)

def forward(self, x, edge_index, edge_weight):
    x = x.permute(2, 0, 1).unsqueeze(0).contiguous()  # Batch, F_in, T_in, N_nodes
    x = self.agcn(x)
    x = F.relu(x)
    x = x.mean(3).mean(1)
    x = self.drop_out(x)
    x = self.linear(x)

    return x.flatten()

Sorry for my ignorance but right now I have no clue what batch size is used.

Thanks, Best, Chris

alexriedel1 commented 2 years ago

Hi @krzysztoffiok, you can pass a batch to the model. If you do, remove unsqueeze(0). This function is implemented to add an additional dimension to the "non-batch" input to have the shape of the "batch" input.

it's just important that the input to the model has the dimensions (Batch, F_in, T_in, N_nodes).

Note that it's not that easy for all the models, for some you have to perfrom the diagonal batching trick https://github.com/benedekrozemberczki/pytorch_geometric_temporal/issues/84

dannierxiao commented 2 years ago

Hi @krzysztoffiok, thank you for posting this question! The solution has been most helpful for my learning task for a different application.

@alexriedel1, thank you for your most helpful assistance on this problem. At the start, you mentioned the proposed solution does not consider edge information. I was wondering how the model could be amended to consider edge information? Specifically, not just weighted edges (as seems the case with the current model that passes snapshot.edge_attr[0]), but edge attributes with multiple features (e.g. 3 edge features in my case)? Thank you!