Verified-Intelligence / auto_LiRPA

auto_LiRPA: An Automatic Linear Relaxation based Perturbation Analysis Library for Neural Networks and General Computational Graphs
https://arxiv.org/pdf/2002.12920
Other
287 stars 74 forks source link

Failed to verify cifar_dm-large_2_255 from CROWN-IBP #2

Closed mzweilin closed 4 years ago

mzweilin commented 4 years ago

Hi there,

I was trying to reproduce Table 2 in https://openreview.net/pdf?id=Skxuk1rFwB

I was able to produce the same results as in CROWN-IBP on three models, but failed on this one:

$ mkdir -p model_weights
$ wget -O model_weights/models_crown-ibp_dm-large.tar.gz https://download.huan-zhang.com/models/crown-ibp/models_crown-ibp_dm-large.tar.gz
$ tar xvf ./model_weights/models_crown-ibp_dm-large.tar.gz -C ./model_weights

$ python examples/vision/cifar_training.py --model cnn_7layer --load model_weights/models_crown-ibp_dm-large/cifar_dm-large_2_255/IBP_large_best.pth --verify --eps .00784313725490196078
resume from eps=0.007843137255
[ 1:  79]: eps=0.007843137255 CE=0.8538 Err=0.2918 Loss=8.0311 Robust_CE=8.0311 Verified_Err=0.9978 Time=0.0234

Did I do anything wrong here? Or is there a bug?

huanzhang12 commented 4 years ago

This is not a bug. cifar_training.py by default uses IBP to verify all models. However, for the CIFAR eps=2/255 model, you need to use CROWN-IBP to verify (IBP for all intermediate neurons, CROWN for last layer). You will need to slightly modify cifar_training.py to verify this model.

You need to change https://github.com/KaidiXu/auto_LiRPA/blob/f4492caea9d7f1e6bcee52e70dbcda6b747f43da/examples/vision/cifar_training.py#L313 replace 'IBP' with 'CROWN-IBP'. You probably also need to make minor changes to a few other places.

huanzhang12 commented 4 years ago

I forgot to mention that you also need to set the factor in CROWN-IBP to 1.0: https://github.com/KaidiXu/auto_LiRPA/blob/f4492caea9d7f1e6bcee52e70dbcda6b747f43da/examples/vision/cifar_training.py#L101 A dirty fix is just to set factor=1.0.

mzweilin commented 4 years ago

Hi @huanzhang12 , thanks for your prompt reply. The two fixes do lower the verified error from 99.78% to 45.50%, but the verified error in other cases get increased. Is it expected? Shall we try both configurations and report the lower error in practice?

Dataset Eps Err IBP Verified_Err CROWN-IBP factor=1.0 Verified_Err
MNIST 0.1 1.05% 2.30% 3.70%
0.2 1.80% 3.80% 13.19%
0.3 1.80% 6.68% 29.53%
0.4 1.80% 12.46% 57.34%
CIFAR-10 2/255 29.18% 99.78% 45.50%
8/255 54.60% 67.11% 79.80%
huanzhang12 commented 4 years ago

@mzweilin This is normal and expected. Except for the CIFAR 2/255 case, other models are eventually trained using the IBP bounds (factor becomes 0 at the end of training) so they should be verified using IBP. For each model, we should report the lowest verified error, but usually the lowest error is obtained using the corresponding training method.

mzweilin commented 4 years ago

@mzweilin This is normal and expected. Except for the CIFAR 2/255 case, other models are eventually trained using the IBP bounds (factor becomes 0 at the end of training) so they should be verified using IBP. For each model, we should report the lowest verified error, but usually the lowest error is obtained using the corresponding training method.

Thanks for the clarification. I have found the difference from https://github.com/huanzhang12/CROWN-IBP/tree/master/config

I'm closing this issue.