deeplearning-wisc / cider

PyTorch implementation of CIDER (How to exploit hyperspherical embeddings for out-of-distribution detection), ICLR 2023
54 stars 7 forks source link

Questions about model trained on CIFAR10 #6

Open v-1024 opened 1 year ago

v-1024 commented 1 year ago

Hi author,

Thank you for your outstanding work! Recently, I repeated this work. During the training process, I trained epoch 500 on the CIFAR10 dataset using the script 'eval_ckpt_cifar10.sh' provided by you, but I encountered some problems during the testing process.

knn score and Mahalanobis score are used as OOD score to detect OOD, and the indicators are as follows:

knn:

           FPR95  AUROC   AUPR
SVHN       4.86   99.23   99.23
places365  25.2   95.36   95.74
iSUN       21.38  96.41   97.39
dtd        17.04  97.29   98.51
LSUN       4.12   99.27   99.34
AVG        14.52  97.51   98.04

Mahalanobis:

           FPR95  AUROC   AUPR
SVHN       96.21  65.56   70.06
places365  94.14  59.71   61.74
iSUN       92.36  57.75   63.97
dtd        96.61  44.15   59.86
LSUN       91.0   63.12   59.14
AVG        94.06  58.06   62.95

It can be seen that when Mahalanobis score is used, FPR95 of the model is close to 100, and the test results are quite different from those given in Appendix D and table 6. I am very confused about this result. I tried to find out the reason, so I first tested the ID classification accuracy of the model on CIFAR10 data set, and the result was surprisingly obtained: accuracy is only 5.41%, which is the main part of my acc test code:

'''calculate acc'''
def accuracy(predictions, labels):
    pred = torch.max(torch.softmax(predictions.data, dim=1), dim=1)[1]
    rights = pred.eq(labels.data.view_as(pred)).sum()
    return rights, len(labels)

# load CIFAR10
normalize = transforms.Normalize(mean=[x/255.0 for x in [125.3, 123.0, 113.9]],
                                 std=[x/255.0 for x in [63.0, 62.1, 66.7]])
transform_test = transforms.Compose([transforms.ToTensor(), normalize])
test_loader = torch.utils.data.DataLoader(
            datasets.CIFAR10(args.id_loc, train=False, transform=transform_test),
            batch_size=args.batch_size, shuffle=False)

# load model parameters
pretrained_dict= torch.load(args.ckpt,  map_location='cpu')['state_dict']
net = set_model(args)
net.load_state_dict(pretrained_dict)
net.eval()

val_rights = []
device = torch.device('cuda:{}'.format(args.gpu) if torch.cuda.is_available() else 'cpu')
with torch.no_grad():
    for (data, target) in tqdm(test_loader):
        data = data.to(device)
        target = target.to(device)

        penultimate = net.encoder(data).squeeze()
        features = F.normalize(penultimate, dim=1)
        out = net.fc(features)
        # calculate acc
        v_right = accuracy(out, target)
        val_rights.append(v_right)

val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))

print('acc: {:.2f}%'.format(100. * val_r[0].cpu().numpy() / val_r[1]))

For the above program, I found in the debugging process that the model's prediction of sample labels concentrated in categories 4 and 3, which was obviously an abnormal phenomenon. I don't know the reason for this result, and I hope to get your answer.

Thank you!

alvinmingsf commented 1 year ago

Thanks for bringing this up! How did you evaluate the ID accuracy? Is it obtained by linear probe as in SupCon https://github.com/HobbitLong/SupContrast/blob/master/main_linear.py? For Mahalanobis score, is the covariance matrix ill-conditioned? If you can provide a checkpoint, I can help take a look

v-1024 commented 1 year ago

Hi author, This is the checkpoint for training the 500 epoch on CIFAR10, Thank you for taking the time to help me check it out.

Thank you!

------------------ 原始邮件 ------------------ 发件人: "deeplearning-wisc/cider" @.>; 发送时间: 2023年4月6日(星期四) 中午12:55 @.>; @.**@.>; 主题: Re: [deeplearning-wisc/cider] Questions about model trained on CIFAR10 (Issue #6)

Thanks for bringing this up! How did you evaluate the ID accuracy? Is it obtained by linear probe as in SupCon https://github.com/HobbitLong/SupContrast/blob/master/main_linear.py? For Mahalanobis score, is the covariance matrix ill-conditioned? If you can provide a checkpoint, I can help take a look

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

从QQ邮箱发来的超大附件

checkpoint_500.pth.tar (87.86M, 2023年05月06日 13:48 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=286537343f53e7c3c0240901436557161b11020050565d5c4e5000075348035a56571a0450030014515c015707040101540006046573655a0b00545f150a0c57173a0204554b154d0b4b4355176558&code=ce74eee9

v-1024 commented 1 year ago

Thank you for your reply!

For the test of ID accuracy, I will test again according to the code you provided. For Mahalanobis score, since the local torch version does not support torch.cov (), the code for calculating covariance matrix is as follows:

def cov_matrix(x):
    """
    Compute the covariance matrix of a given tensor x
    """
    x_mean = torch.mean(x, dim=1, keepdim=True)
    x_centered = x - x_mean
    cov = torch.matmul(x_centered, x_centered.t()) / (x.shape[1] - 1)
    return cov

In addition, I am very willing to provide the checkpoint, checkpoint has been sent to your email. Thank you for taking the time to help me check it out.

Thank you!

emannix commented 1 year ago

I'm also struggling to reproduce the results in the paper for CIFAR-10 with this code base. I'm getting a similar AUROC (96.89) but a larger FPR95 (19.43). Was there quite a bit of noise in the FPR95 results for CIFAR-10?