confusion about network structure for svhn dataset

selous123 commented 4 years ago

Thanks for your promising work and your opening-source code.

We are going to follow your work, bur get confusion about your describe about network architecture on svhn dataset.

For this experiment, we used convolutional
network with 2 convolutional layers of kernel size 4 and stride 1 with 32 and 64 filters respectively.
These convolutional layers are followed by a fully connected layer with 100 neurons.

The size of filter size just change like this?

## output_size = (input_size - kernel_size + 2 * padding_size + 1)/stride
bx3x32x32 => bx32x29x29 => bx64x26x26=> bx100

How huge about the parameter size in the fully connected layer (OOM on my GTX1080Ti)

Without pooling(maxpooling/stride) is also strange in network design Is there any other key points that I have misunderstood?

Waiting for your reply, thank you in advance.

mbalunovic commented 4 years ago

Hi, there is indeed a typo there, thanks for spotting it. First kernel size is 5 and second kernel size is 4, while strides are both 2 and not 1. The architecture is same as the one in CIFAR-10 8/255 (called ConvMed in the code), just with different number of filters and size of linear layer: https://github.com/eth-sri/colt/blob/master/code/networks.py#L58

I will correct this in the paper. Let me know if I can help further.

selous123 commented 4 years ago

THANKS for your replying and your promising work.

We are going to follow your work and want to reproduce your work.

We have done that on mnist and cifar dataset. Unfortunatly, it is not work smoothly on svhn dataset.

thanks your for your reply and we will follow your advise to train the network on svhn dataset tonight.

Have you have any suggestions or do you have any plan to share your pretriained model on svhn dataset?

Waiting for your reply.

selous123 commented 4 years ago

Hi! @mbalunovic

we have trained models on svhn datasets.

train scripts:

python code/main.py \
       --train-mode train \
       --dataset svhn \
       --exp-name colt \
       --net convmed_flat_2_2_100 \
       --train-batch 100 --test-batch 100 \
       --train-eps 0.01 \
       --start-eps-factor 0.01 --eps-factor 1.2 \
       --layers -2 -1 2 4  \
       --train-att-n-steps 40 --train-att-step-size 0.035 --test-att-n-steps 40 --test-att-step-size 0.035 \
       --opt adam --lr 0.0001 --lr-step 20 --lr-factor 0.5 --lr-layer-dec 0.75 \
       --mix --mix-epochs 60 --n-epochs 200 \
       --l1-reg 0.00005 --relu-stable 0.005 --relu-stable-factor 1.5 \
       --test-freq  50\

and test with following scripts:

python code/verify.py \
     --dataset svhn \
     --net convmed_flat_2_2_100 \
     --load_model models_new/svhn/colt/1/convmed_flat_2_2_100_0.01000/1599055902/-1/net_200.pt \
     --test_eps 0.01 \
     --attack_restarts 20 --test_att_n_steps 40 --test_att_step_size 0.035 \
     --start_idx 0 --end_idx 1000 \
     --num_iters 100 \
     --layer_idx 4\
     --refine_lidx 3 \
     --milp_timeout 1000 \
     --max_binary 30 \
     --test_batch 20 \
     --fail_break

NOTE that we only test the models saved in 200 epoches

RESULT HERE: verfied accuracy is quite low.

Verify test_idx = 0
tot_tests: 1, verified: 0.00000 [0/1], nat_ok: 0.00000 [0/1], latent_ok: 0.00000 [0/1], pgd_ok: 0.00000 [0/1]
=====================================
Verify test_idx = 1
loss before refine:  tensor(79.0891876221, device='cuda:0')
  0%|                                                                                                                 | 0/64 [00:00<?, ?it/s]Using license file /home/lrh/gurobi.lic
Academic license - for non-commercial use only
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:31<00:00,  2.04it/s]
loss after refine:  tensor(79.0313720703, device='cuda:0')
Unstable ReLU:  65  binary:  30
MILP:  2 0 -9.8906763337909 -39.36412376941467 9.096289157867432
tot_tests: 2, verified: 0.00000 [0/2], nat_ok: 0.50000 [1/2], latent_ok: 0.00000 [0/2], pgd_ok: 0.50000 [1/2]
=====================================
Verify test_idx = 2
tot_tests: 3, verified: 0.00000 [0/3], nat_ok: 0.66667 [2/3], latent_ok: 0.00000 [0/3], pgd_ok: 0.33333 [1/3]
=====================================
Verify test_idx = 3
tot_tests: 4, verified: 0.00000 [0/4], nat_ok: 0.50000 [2/4], latent_ok: 0.00000 [0/4], pgd_ok: 0.25000 [1/4]
=====================================
Verify test_idx = 4
loss before refine:  tensor(126.4415054321, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:44<00:00,  1.45it/s]
loss after refine:  tensor(126.4348373413, device='cuda:0')
Unstable ReLU:  76  binary:  30
MILP:  6 0 -11.62508708577596 -72.27775782767232 7.978797912597656
tot_tests: 5, verified: 0.00000 [0/5], nat_ok: 0.60000 [3/5], latent_ok: 0.00000 [0/5], pgd_ok: 0.40000 [2/5]
=====================================
Verify test_idx = 5
tot_tests: 6, verified: 0.00000 [0/6], nat_ok: 0.50000 [3/6], latent_ok: 0.00000 [0/6], pgd_ok: 0.33333 [2/6]
=====================================
Verify test_idx = 6
tot_tests: 7, verified: 0.00000 [0/7], nat_ok: 0.57143 [4/7], latent_ok: 0.00000 [0/7], pgd_ok: 0.28571 [2/7]
=====================================
Verify test_idx = 7
loss before refine:  tensor(47.8477478027, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:49<00:00,  1.29it/s]
loss after refine:  tensor(46.6920013428, device='cuda:0')
Unstable ReLU:  49  binary:  30
MILP:  1 0 3.378185561774174 0.007527297093868679 37.329697132110596
Unstable ReLU:  48  binary:  30
MILP:  1 2 -1.9840151576084928 -13.715990919653848 7.628278017044067
tot_tests: 8, verified: 0.00000 [0/8], nat_ok: 0.62500 [5/8], latent_ok: 0.00000 [0/8], pgd_ok: 0.37500 [3/8]
=====================================
Verify test_idx = 8
loss before refine:  tensor(67.1712951660, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:25<00:00,  2.54it/s]
loss after refine:  tensor(67.1148147583, device='cuda:0')
Unstable ReLU:  63  binary:  30
MILP:  1 0 3.486628995720145 0.1733241566778132 42.78998398780823
Unstable ReLU:  64  binary:  30
MILP:  1 2 -15.22443211099868 -40.247538190735966 6.821177005767822
tot_tests: 9, verified: 0.00000 [0/9], nat_ok: 0.66667 [6/9], latent_ok: 0.00000 [0/9], pgd_ok: 0.44444 [4/9]
=====================================

Have i ever make any mistakes? Do you have any suggestions for the scripts?

mbalunovic commented 3 years ago

Hi, I will try to upload the model and scripts for SVHN by the next week. I think the reason that you are not verifying much is that you are using --layer_idx 4, while you should use --layer_idx 3. Could you try changing this argument and running again?

selous123 commented 3 years ago

Hi, I have modified layer_idx from 4 to 3

python code/verify.py \
     --dataset svhn \
     --net convmed_flat_2_2_100 \
     --load_model models_new/svhn/colt/1/convmed_flat_2_2_100_0.01000/1599055902/-1/net_200.pt \
     --test_eps 0.01 \
     --attack_restarts 20 --test_att_n_steps 40 --test_att_step_size 0.035 \
     --start_idx 0 --end_idx 1000 \
     --num_iters 100 \
     --layer_idx 3\
     --refine_lidx 3 \
     --milp_timeout 1000 \
     --max_binary 30 \
     --test_batch 20 \
     --fail_break

But there is no sign of to be what we have expected.

ConvMed(
  (blocks): Sequential(
    (layers): ModuleList(
      (0): Normalization()
      (1): Conv2d(
        (conv): Conv2d(3, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
      )
      (2): ReLU()
      (3): Conv2d(
        (conv): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      )
      (4): ReLU()
      (5): Flatten()
      (6): Linear(
        (linear): Linear(in_features=4096, out_features=100, bias=True)
      )
      (7): ReLU()
      (8): Linear(
        (linear): Linear(in_features=100, out_features=10, bias=True)
      )
    )
  )
)
Using downloaded and verified file: ./data/train_32x32.mat
Using downloaded and verified file: ./data/test_32x32.mat
Verify test_idx = 0
tot_tests: 1, verified: 0.00000 [0/1], nat_ok: 0.00000 [0/1], latent_ok: 0.00000 [0/1], pgd_ok: 0.00000 [0/1]
=====================================
Verify test_idx = 1
loss before refine:  tensor(79.0893249512, device='cuda:0')
  0%|                                                                                                                 | 0/64 [00:00<?, ?it/s]Using license file /home/lrh/gurobi.lic
Academic license - for non-commercial use only
 41%|██████████████████████████████████████████▎                                 42%|██████████████████▏                        | 27/64 [00:1100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:25<00:00,  2.50it/s]
loss after refine:  tensor(79.0312347412, device='cuda:0')
Unstable ReLU:  801  binary:  30
Unstable ReLU:  65  binary:  30
MILP:  2 0 -4.288788787952699 -23.018137411826586 98.24894690513611
tot_tests: 2, verified: 0.00000 [0/2], nat_ok: 0.50000 [1/2], latent_ok: 0.50000 [1/2], pgd_ok: 0.50000 [1/2]
=====================================
Verify test_idx = 2
tot_tests: 3, verified: 0.00000 [0/3], nat_ok: 0.66667 [2/3], latent_ok: 0.33333 [1/3], pgd_ok: 0.33333 [1/3]
=====================================
Verify test_idx = 3
tot_tests: 4, verified: 0.00000 [0/4], nat_ok: 0.50000 [2/4], latent_ok: 0.25000 [1/4], pgd_ok: 0.25000 [1/4]
=====================================
Verify test_idx = 4
loss before refine:  tensor(126.4411926270, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:33<00:00,  1.89it/s]
loss after refine:  tensor(126.4344406128, device='cuda:0')
Unstable ReLU:  1169  binary:  30
Unstable ReLU:  76  binary:  30
MILP:  6 0 -5.8332019333592555 -72.31838026403184 28.852242946624756
tot_tests: 5, verified: 0.00000 [0/5], nat_ok: 0.60000 [3/5], latent_ok: 0.40000 [2/5], pgd_ok: 0.40000 [2/5]
=====================================
Verify test_idx = 5
tot_tests: 6, verified: 0.00000 [0/6], nat_ok: 0.50000 [3/6], latent_ok: 0.33333 [2/6], pgd_ok: 0.33333 [2/6]
=====================================
Verify test_idx = 6
tot_tests: 7, verified: 0.00000 [0/7], nat_ok: 0.57143 [4/7], latent_ok: 0.28571 [2/7], pgd_ok: 0.28571 [2/7]
=====================================
Verify test_idx = 7
loss before refine:  tensor(47.8477172852, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:48<00:00,  1.31it/s]
loss after refine:  tensor(46.6947708130, device='cuda:0')
label already verified:  0
Unstable ReLU:  1088  binary:  30
Unstable ReLU:  48  binary:  30

Maybe some errors occur in training process

Waiting for your sharing about the pretrained model and scripts

selous123 commented 3 years ago

Hi @mbalunovic

I also test your pretrained models on cifar-10 with \eps=2/255 on ALL TEST DATASET for almost 1 week on nvidia 1080ti.

Verfied scripts:

python code/verify.py \
     --dataset cifar10 \
     --net convmedbig_flat_2_2_4_250 \
     --load_model xxx
     --test_eps 0.00784313725 \
     --attack_restarts 20 --test_att_n_steps 100 --test_att_step_size 0.015 \
     --start_idx 0 --end_idx 10000 \
     --num_iters 100 \
     --layer_idx 6\
     --refine_lidx 3 \
     --milp_timeout 1000 \
     --max_binary 30 \
     --test_batch 20 \
     --fail_break

with the following result.

=====================================
Verify test_idx = 9996
loss before refine:  tensor(2.8644526005, device='cuda:0')
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 256/256 [00:30<00:00,  8.27it/s]
loss after refine:  tensor(2.8527946472, device='cuda:0')
adv_idx=0 verified without MILP
adv_idx=1 verified without MILP
adv_idx=2 verified without MILP
adv_idx=4 verified without MILP
Unstable ReLU:  75  binary:  30
MILP:  3 5 0.27997091209964964 0.02114075281830044 16.27850103378296
Unstable ReLU:  75  binary:  30
MILP:  3 6 0.2261094196545308 0.017413887299067815 17.594099044799805
adv_idx=7 verified without MILP
adv_idx=8 verified without MILP
adv_idx=9 verified without MILP
tot_tests: 9997, verified: 0.57897 [5788/9997], nat_ok: 0.78414 [7839/9997], latent_ok: 0.60188 [6017/9997], pgd_ok: 0.67950 [6793/9997]
=====================================
Verify test_idx = 9997
tot_tests: 9998, verified: 0.57902 [5789/9998], nat_ok: 0.78416 [7840/9998], latent_ok: 0.60192 [6018/9998], pgd_ok: 0.67954 [6794/9998]
=====================================
Verify test_idx = 9998
tot_tests: 9999, verified: 0.57906 [5790/9999], nat_ok: 0.78418 [7841/9999], latent_ok: 0.60196 [6019/9999], pgd_ok: 0.67957 [6795/9999]
=====================================
Verify test_idx = 9999
tot_tests: 10000, verified: 0.57910 [5791/10000], nat_ok: 0.78420 [7842/10000], latent_ok: 0.60200 [6020/10000], pgd_ok: 0.67960 [6796/10000]
=====================================

I have noticed that the result is 57.91% for verified accuracy which is not 60.2% as reported in original paper (METHOINED AS latent_ok ACCURACY).

mbalunovic commented 3 years ago

Hi, please follow the directions from readme to reproduce results from the paper. In particular, this means running the following command:

$ ./scripts/certify_cifar10_2_255

Note that script consists of 3 separate verification commands while you were running only one.

eth-sri / colt

confusion about network structure for svhn dataset #3