Questions about the Unsupervised_TU experiments

ha-lins commented 3 years ago

Hi @yyou1996,

Thanks for your efforts in this project. I have some questions as follows:

I notice that the training & evaluation process of GraphCL is slightly different from InfoGraph, which evaluates per epoch. GraphCL doesn't save the model during training and evaluates per 10 epochs. I wonder if I could evaluate per epoch and choose the corresponding test acc. to the highest val acc. as the final test result. Or I can choose the highest test acc. directly, though testing many times during training could be weird.

I tried evaluating per epoch on IMDB-B and the training process was a bit weird to me. The validate acc. didn't improve through more training epochs, which probably means the training process didn't benefit the representation learning well in my opinion. Could you pls give some analysis or explanations?

Epoch 1, Loss 420.73120760917664
acc_val, acc = 0.72 0.709
Epoch 2, Loss 407.65890550613403
acc_val, acc = 0.701 0.696
Epoch 3, Loss 395.5375609397888
acc_val, acc = 0.673 0.679
Epoch 4, Loss 383.75150847435
acc_val, acc = 0.707 0.7020000000000001
Epoch 5, Loss 371.76993465423584
acc_val, acc = 0.6759999999999999 0.688
Epoch 6, Loss 361.50709533691406
acc_val, acc = 0.7020000000000001 0.701
Epoch 7, Loss 352.6328740119934
acc_val, acc = 0.707 0.705
Epoch 8, Loss 340.92082047462463
acc_val, acc = 0.72 0.711
Epoch 9, Loss 334.9960980415344
acc_val, acc = 0.694 0.688
Epoch 10, Loss 325.21264362335205
acc_val, acc = 0.7 0.7070000000000001
Epoch 11, Loss 318.04585337638855
acc_val, acc = 0.7089999999999999 0.7230000000000001
Epoch 12, Loss 310.82839179039
acc_val, acc = 0.7070000000000001 0.722
Epoch 13, Loss 304.04966139793396
acc_val, acc = 0.72 0.7250000000000001
Epoch 14, Loss 299.41626381874084
acc_val, acc = 0.744 0.715
Epoch 15, Loss 293.7600963115692
acc_val, acc = 0.6910000000000001 0.7220000000000001
Epoch 16, Loss 288.6614990234375
acc_val, acc = 0.703 0.7270000000000001
Epoch 17, Loss 285.12760519981384
acc_val, acc = 0.717 0.726
Epoch 18, Loss 280.9172787666321
acc_val, acc = 0.736 0.7289999999999999
Epoch 19, Loss 277.5624203681946
acc_val, acc = 0.71 0.715
Epoch 20, Loss 274.18093502521515
acc_val, acc = 0.7150000000000001 0.7289999999999999

Btw, the final result of the seed should be 0.7289(epoch 18) due to the highest val_acc.(0.736), right? Thanks in advance!

yyou1996 commented 3 years ago

Hello @ha-lins,

Yes per epoch evaluation is doable with validation performance to select test acc; I do it per 10 epoch since I found evaluation is quite slow for some datasets, and if doing per epoch it is expected to obtain a better performance.
I do observe this, for some small datasets, e.g. imdb-b as you show. For larger dataset it will be better that val performance will increase, but only in first several epochs (also for supervised training for TU dataset, please refer to GIN paper and its repo). My impression is that it is not a so challenging task for GNN training yet, especially on small datasets+shallow networks.

Your selection is correct that pick the highest val acc, and the corresponding test acc.

ha-lins commented 3 years ago

Thanks for your reply! It's really helpful.

ha-lins commented 3 years ago

Hi @yyou1996,

I have a question about the SVM classifier. I found that using the k-fold cross validation could not be reasonable, since it uses the labels of the training folds and violates the unsupervised setting. I know this setting is following Infograph but this could be unreasonable, right?

The former assumption can be verified that I tried evaluating without pre-training (in other words, using the randomly initialized GNN encoder) and the results are lower slightly (or even better) than GraphCL.

          MUTAG | IMDB-BINARY | IMDB-MULTI | PTC_MR | NCI1 | PROTEINS | DD
GraphCL   86.8±1.3 | 71.14±0.4 | 48.4±0.8 | 58.4±1.7 | 77.9±0.4 | 74.4±0.5 | 78.6±0.4
SVM       88.9±2.7 | 70.1±0.3 | 46.2±0.6 | 60.5±1.3 | 70.8±1.9| 72.8±0.4 | 74.2±0.9

Since the SVM has seen the data distribution through training folds, it can predict easily during testing. Due to the high performance of the SVM itself, I think the improvements of the contrastive learning method could be somewhat marginal. I think this evaluation method should be improved. Could you give some comments or solutions about it? Thanks!

yyou1996 commented 3 years ago

Hi @ha-lins,

Thanks for your question. I think it is the common setting for unsupervised representation learning, if you take a look at simclr paper http://proceedings.mlr.press/v119/chen20j.html. The unsupervised procedure is specific for "learning the representations", and then evaluate on the learned one with labels (see Figure 1 in simclr as how do they define unsupervision).

For performance you show, I would argue that the two exceptions GraphCL perform worse are only on quite small datasets (the two smallest ones MUTAG with 188 graphs and PTC_MR with 344 graphs, please check https://chrsmrrs.github.io/datasets/docs/datasets/). We believe the power of self-supervision stands out in large-scale unlabelled data with few labels regime, as the performance you show on other datasets. Nevertheless, I think it is interesting to observe that in few-data regime GraphCL even deteriorates performance.

ha-lins commented 3 years ago

Thanks for your helpful comments! Now I understand the rationality of the evaluation protocol.

ha-lins commented 3 years ago

Hi, @yyou1996

Could you pls explain the purpose about these lines (line 211-225) of unsupervised_TU/gsimclr.py? I'm confused about them. Thanks!

            if args.aug == 'dnodes' or args.aug == 'subgraph' or args.aug == 'random2' or args.aug == 'random3' or args.aug == 'random4':
                # node_num_aug, _ = data_aug.x.size()
                edge_idx = data_aug.edge_index.numpy()
                _, edge_num = edge_idx.shape
                idx_not_missing = [n for n in range(node_num) if (n in edge_idx[0] or n in edge_idx[1])]
                node_num_aug = len(idx_not_missing)
                data_aug.x = data_aug.x[idx_not_missing]

                data_aug.batch = data.batch[idx_not_missing]
                idx_dict = {idx_not_missing[n]:n for n in range(node_num_aug)}
                edge_idx = [[idx_dict[edge_idx[0, n]], idx_dict[edge_idx[1, n]]] for n in range(edge_num) if not edge_idx[0, n] == edge_idx[1, n]]
                data_aug.edge_index = torch.tensor(edge_idx).transpose_(0, 1)

yyou1996 commented 3 years ago

Hi @ha-lins,

Thanks for question. This is the post-process step after the augmentation dataloader, since in unsupervised_TU repo if I delete nodes in dataloader get() function it will not re-index the node (e.g. original graph have 5 nodes, if I delete the 2nd and 3rd it outputs graph with node index 1,4,5, but actually I want 1,2,3 after re-index), which is this step for.

unsupervised_TU is the early version of implementation that needs this post-process step. In other repos we do not need it.

Shen-Lab / GraphCL

Questions about the Unsupervised_TU experiments #17