Starlien95 / GraphPrompt

GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks
139 stars 14 forks source link

关于复现结果 #9

Closed SwaggyZhang closed 11 months ago

SwaggyZhang commented 11 months ago

您好,感谢您开源并上传论文的最新代码,在学习您代码并试图复现的过程中,我发现prompt_fewshot.py的运行效果并不很好(以10折交叉验证的平均acc为准),默认数据集上的10折交叉验证平均acc在23.5左右,论文中展示的acc为31.84,相差略远,现将我的参数附上。(仅将pre_train.py 中的train_num_per_class以及shot_num改为5)

我在pre_train.py 的运行参数如下:

train_config = {
    "max_npv": 8,  # max_number_pattern_vertices: 8, 16, 32
    "max_npe": 8,  # max_number_pattern_edges: 8, 16, 32
    "max_npvl": 8,  # max_number_pattern_vertex_labels: 8, 16, 32
    "max_npel": 8,  # max_number_pattern_edge_labels: 8, 16, 32

    "max_ngv": 126,  # max_number_graph_vertices: 64, 512,4096
    "max_nge": 298,  # max_number_graph_edges: 256, 2048, 16384
    "max_ngvl": 7,  # max_number_graph_vertex_labels: 16, 64, 256
    "max_ngel": 2,  # max_number_graph_edge_labels: 16, 64, 256

    "base": 2,

    "gpu_id": 0,
    "num_workers": 0,

    "epochs": 400,
    "batch_size": 1024,
    "update_every": 1,  # actual batch_sizer = batch_size * update_every
    "print_every": 100,
    "init_emb": "Equivariant",  # None, Orthogonal, Normal, Equivariant
    "share_emb": True,  # sharing embedding requires the same vector length
    "share_arch": True,  # sharing architectures
    "dropout": 0.2,
    "dropatt": 0.2,

    "reg_loss": "MSE",  # MAE, MSEl
    "bp_loss": "MSE",  # MAE, MSE
    "bp_loss_slp": "anneal_cosine$1.0$0.01",  # 0, 0.01, logistic$1.0$0.01, linear$1.0$0.01, cosine$1.0$0.01,
    # cyclical_logistic$1.0$0.01, cyclical_linear$1.0$0.01, cyclical_cosine$1.0$0.01
    # anneal_logistic$1.0$0.01, anneal_linear$1.0$0.01, anneal_cosine$1.0$0.01
    "lr": 0.1,
    "weight_decay": 0.00001,
    "max_grad_norm": 8,

    "model": "GIN",  # CNN, RNN, TXL, RGCN, RGIN, RSIN

    "predict_net": "SumPredictNet",  # MeanPredictNet, SumPredictNet, MaxPredictNet,
    # MeanAttnPredictNet, SumAttnPredictNet, MaxAttnPredictNet,
    # MeanMemAttnPredictNet, SumMemAttnPredictNet, MaxMemAttnPredictNet,
    # DIAMNet
    # "predict_net_add_enc": True,
    # "predict_net_add_degree": True,
    "predict_net_add_enc": True,
    "predict_net_add_degree": True,

    "predict_net_hidden_dim": 128,
    "predict_net_num_heads": 4,
    "predict_net_mem_len": 4,
    "predict_net_mem_init": "mean",
    # mean, sum, max, attn, circular_mean, circular_sum, circular_max, circular_attn, lstm
    "predict_net_recurrent_steps": 3,

    "emb_dim": 128,
    "activation_function": "leaky_relu",  # sigmoid, softmax, tanh, relu, leaky_relu, prelu, gelu

    "filter_net": "MaxGatedFilterNet",  # None, MaxGatedFilterNet
    # MeanAttnPredictNet, SumAttnPredictNet, MaxAttnPredictNet,
    # MeanMemAttnPredictNet, SumMemAttnPredictNet, MaxMemAttnPredictNet,
    # DIAMNet
    # "predict_net_add_enc": True,
    # "predict_net_add_degree": True,
    "txl_graph_num_layers": 3,
    "txl_pattern_num_layers": 3,
    "txl_d_model": 128,
    "txl_d_inner": 128,
    "txl_n_head": 4,
    "txl_d_head": 4,
    "txl_pre_lnorm": True,
    "txl_tgt_len": 64,
    "txl_ext_len": 0,  # useless in current settings
    "txl_mem_len": 64,
    "txl_clamp_len": -1,  # max positional embedding index
    "txl_attn_type": 0,  # 0 for Dai et al, 1 for Shaw et al, 2 for Vaswani et al, 3 for Al Rfou et al.
    "txl_same_len": False,

    "gcn_num_bases": 8,
    "gcn_regularizer": "bdd",  # basis, bdd
    "gcn_graph_num_layers": 3,
    "gcn_hidden_dim": 32,
    "gcn_ignore_norm": False,  # ignorm=True -> RGCN-SUM

    "dataset": "ENZYMES",  # {ENZYMES, BZR, COX2, PROTEINS}
    "graph_dir": "../data/ENZYMES/ENZYMESPreTrain",
    "save_data_dir": "../data/ENZYMESPreTrain", 
    "save_model_dir": "../dumps/ENZYMESPreTrain/GCN",
    "save_pretrain_model_dir": "../dumps/MUTAGPreTrain/GCN",
    "graphslabel_dir":"../data/ENZYMES/ENZYMES_graph_labels.txt",
    "downstream_graph_dir": "../data/debug/graphs",
    "downstream_save_data_dir": "../data/debug",
    "downstream_save_model_dir": "../dumps/debug",
    "downstream_graphslabel_dir":"../data/debug/graphs",
    "train_num_per_class": 5,
    "shot_num": 5,
    "temperature": 1,
    "graph_finetuning_input_dim": 8,
    "graph_finetuning_output_dim": 2,
    "graph_label_num": 6,
    "seed": 0,
    "dropout": 0.5,
    "node_feature_dim": 18,
    "pretrain_hop_num": 1
}

prompt_fewshot.py 中的参数如下:

train_config = {
    "max_npv": 620,  # max_number_pattern_vertices: 8, 16, 32
    "max_npe": 2098,  # max_number_pattern_edges: 8, 16, 32
    "max_npvl": 2,  # max_number_pattern_vertex_labels: 8, 16, 32
    "max_npel": 2,  # max_number_pattern_edge_labels: 8, 16, 32

    "max_ngv": 126,  # max_number_graph_vertices: 64, 512,4096
    "max_nge": 298,  # max_number_graph_edges: 256, 2048, 16384
    "max_ngvl": 7,  # max_number_graph_vertex_labels: 16, 64, 256
    "max_ngel": 2,  # max_number_graph_edge_labels: 16, 64, 256

    "base": 2,

    "gpu_id": -1,
    "num_workers": 12,

    "epochs": 100,
    "batch_size": 512,
    "update_every": 1,  # actual batch_sizer = batch_size * update_every
    "print_every": 100,
    "init_emb": "Equivariant",  # None, Orthogonal, Normal, Equivariant
    "share_emb": True,  # sharing embedding requires the same vector length
    "share_arch": True,  # sharing architectures
    "dropout": 0,
    "dropatt": 0.2,

    "reg_loss": "NLL",  # MAE, MSEl
    "bp_loss": "NLL",  # MAE, MSE
    "bp_loss_slp": "anneal_cosine$1.0$0.01",  # 0, 0.01, logistic$1.0$0.01, linear$1.0$0.01, cosine$1.0$0.01,
    # cyclical_logistic$1.0$0.01, cyclical_linear$1.0$0.01, cyclical_cosine$1.0$0.01
    # anneal_logistic$1.0$0.01, anneal_linear$1.0$0.01, anneal_cosine$1.0$0.01
    "lr": 0.01,
    "weight_decay": 0.00001,
    "max_grad_norm": 8,

    "pretrain_model": "GIN",

    "emb_dim": 128,
    "activation_function": "leaky_relu",  # sigmoid, softmax, tanh, relu, leaky_relu, prelu, gelu

    "filter_net": "MaxGatedFilterNet",  # None, MaxGatedFilterNet
    "predict_net": "SumPredictNet",  # MeanPredictNet, SumPredictNet, MaxPredictNet,
    "predict_net_add_enc": True,
    "predict_net_add_degree": True,

    # MeanAttnPredictNet, SumAttnPredictNet, MaxAttnPredictNet,
    # MeanMemAttnPredictNet, SumMemAttnPredictNet, MaxMemAttnPredictNet,
    # DIAMNet
    # "predict_net_add_enc": True,
    # "predict_net_add_degree": True,
    "txl_graph_num_layers": 3,
    "txl_pattern_num_layers": 3,
    "txl_d_model": 128,
    "txl_d_inner": 128,
    "txl_n_head": 4,
    "txl_d_head": 4,
    "txl_pre_lnorm": True,
    "txl_tgt_len": 64,
    "txl_ext_len": 0,  # useless in current settings
    "txl_mem_len": 64,
    "txl_clamp_len": -1,  # max positional embedding index
    "txl_attn_type": 0,  # 0 for Dai et al, 1 for Shaw et al, 2 for Vaswani et al, 3 for Al Rfou et al.
    "txl_same_len": False,

    "gcn_num_bases": 8,
    "gcn_regularizer": "bdd",  # basis, bdd
    "gcn_graph_num_layers": 3,
    "gcn_hidden_dim": 32,
    "gcn_ignore_norm": False,  # ignorm=True -> RGCN-SUM
    "dataset":"ENZYMES", # {ENZYMES, COX2, BZR, PROTEINS}
    "graph_dir": "../data/ENZYMES/raw",
    "save_data_dir": "../data/ENZYMESPreTrain",
    "save_model_dir": "../dumps/debug",
    "save_pretrain_model_dir": "../dumps/ENZYMESPreTrain/GIN",
    "graphslabel_dir":"../data/ENZYMES/ENZYMES_graph_labels.txt",
    "downstream_graph_dir": "../data/debug/graphs",
    "downstream_save_data_dir": "../data/debug",
    "downstream_save_model_dir": "./dumps/ENZYMESGraphClassification/Prompt/GCN-FEATURE-WEIGHTED-SUM/5train5val100task",
    "downstream_graphslabel_dir":"../data/debug/graphs",
    "temperature": 0.01,
    "graph_finetuning_input_dim": 8,
    "graph_finetuning_output_dim": 2,
    "graph_label_num":6,
    "seed": 0,
    "update_pretrain": False,
    "dropout": 0.5,
    "gcn_output_dim": 8,

    "prompt": "FEATURE-WEIGHTED-SUM",
    "prompt_output_dim": 2,
    "scalar": 1e3,

    "dataset_seed": 0,
    "train_shotnum": 5,
    "val_shotnum": 5,
    "few_shot_tasknum": 10,  # default:

    "save_fewshot_dir": "../data/ENZYMESGraphClassification/fewshot",

    "downstream_dropout": 0,
    "node_feature_dim": 18,
    "train_label_num": 6,
    "val_label_num": 6,
    "test_label_num": 6
}

GPU为RTX3090

Starlien95 commented 11 months ago

您的复现中可能存在以下问题: 1.pre-train.py是在整个数据集上预训练的,修改shotnum需要在下游上修改 2.论文中并没有使用交叉验证的方式来获取最终结果 3.根据您所用的机器和实验环境,超参数需要进行适当调整

SwaggyZhang commented 11 months ago

您的复现中可能存在以下问题: 1.pre-train.py是在整个数据集上预训练的,修改shotnum需要在下游上修改 2.论文中并没有使用交叉验证的方式来获取最终结果 3.根据您所用的机器和实验环境,超参数需要进行适当调整

感谢您在百忙之中的耐心解答! 关于交叉验证,我是直接以prompt_fewshot.pyline 465的acc计算方法为准的,并未做自定义改动。 请问您是以这个结果为准,还是以其他结果为准呢?

Starlien95 commented 11 months ago

您的复现中可能存在以下问题: 1.pre-train.py是在整个数据集上预训练的,修改shotnum需要在下游上修改 2.论文中并没有使用交叉验证的方式来获取最终结果 3.根据您所用的机器和实验环境,超参数需要进行适当调整

感谢您在百忙之中的耐心解答! 关于交叉验证,我是直接以prompt_fewshot.pyline 465的acc计算方法为准的,并未做自定义改动。 请问您是以这个结果为准,还是以其他结果为准呢?

465行是读取数据时的代码;计算acc的代码见359行; 您所遇到的问题可能由于其他两点导致的,另外不佳的预训练模型会极大影响下游任务的表现。

SwaggyZhang commented 11 months ago

您的复现中可能存在以下问题: 1.pre-train.py是在整个数据集上预训练的,修改shotnum需要在下游上修改 2.论文中并没有使用交叉验证的方式来获取最终结果 3.根据您所用的机器和实验环境,超参数需要进行适当调整

感谢您在百忙之中的耐心解答! 关于交叉验证,我是直接以prompt_fewshot.pyline 465的acc计算方法为准的,并未做自定义改动。 请问您是以这个结果为准,还是以其他结果为准呢?

465行是读取数据时的代码;计算acc的代码见359行; 您所遇到的问题可能由于其他两点导致的,另外不佳的预训练模型会极大影响下游任务的表现。

感谢您的耐心解答! 我明白了您的意思,但可能是我之前的表达不清楚。

image

您在prompt_few-shot.pyline643 - line645标明了10fold,不知我是否理解有误? 您在之前的回答中说,论文中的acc并未采用此处的10 fold acc,那么您是采用的line 650acc_mean吗? 再次感谢您的回复!