Closed Hurry-LJ closed 1 year ago
We use AUC instead of ACC to evaluate the model.
Yeah, I know the AUC evaluation, but I find some issues of training. Specifically, I add some code in gnn.py test() after line78 to supervise the training. Check the number of 0 class and 1 class ` res_y = y_pred[node_id].argmax(axis = -1)
y1 = res_y.numpy().tolist() y2 = data.y[node_id].numpy().tolist()
dd = {'y1' : y1, 'y2' : y2} df = pd.DataFrame(dd)
print(key) print(df['y1'].value_counts()) print(df['y2'].value_counts()) ` Issue: the output is not accidental
Thanks very much for your response
For a unbalanced dataset, we don't use 0.5 as the threshold. That's the reason we need AUC, which is the probability that the model ranks a random positive example more highly than a random negative example. And in the code above, even through all the samples are given a high prob to be 0, the 1 samples are with lower prob than the 0 samples.
您好,二分类正负样本不在一个数量级,最终使得模型躺平,盲目选择为正常用户;以train为例,0:1 = 847042:10857,1全预测为0,准确率约83/84,感谢解答