eth-sri / eran

ETH Robustness Analyzer for Deep Neural Networks
Apache License 2.0
320 stars 103 forks source link

Some questions about provided models #79

Closed dgl-prc closed 3 years ago

dgl-prc commented 3 years ago

Hi,
Thanks for providing a number of pretrained neural networks, and it would be better to give the following information:

PS: I added the following code

        if label == predicted_label:
            acc_cnt += 1
            continue

after https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/__main__.py#L1136-L1140

mnmueller commented 3 years ago

Hello @dgl-prc,

This accuracy does indeed seem incorrect. Could you provide more details on which of the 3 ConvSmall and 9 ConvMed networks you are interested in?

I recommend computing the accuracy with the gpupoly domain in standard robustness certification mode, e.g.,

python3 . --netname ../nets/convSmallRELU__Point.onnx --dataset cifar10 --domain gpupoly --epsilon 0 --num_test 10000

I just checked all 3 ConvSmall networks this way (the runtime is just a couple of seconds for all 10000 samples) and got reasonable accuracies (in this case 60.78%) I also tried running with spatial which you seem to be doing (although just adding a continue in after the already existing counter) and recovered the same accuracies although the runtime will be higher since gpupoly is not available there. https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/__main__.py#L1148

Regarding your second question: We will move to only supporting the .onnx format (which has already been supported for a while) and hence will not provide scripts to generate model definitions in other formats. If you have a model in pytorch it can be exported to onnx format as follows:

torch.onnx.export(model, dummy_input, "model_save_path.onnx", verbose=True, input_names=["input"], output_names=["output"])

PS: If you do not already have a .csv for the full cifar10 test set, it can be generated as follows:

cfiar_ds = torchvision.datasets.CIFAR10(root="../data/datasets/cifar10", train=False, download=True, transform=torchvision.transforms.ToTensor())
with open('cifar10_test_full.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
    for x_i, y_i in cfiar_ds:
        csvwriter.writerow([y_i] + [f"{torch.round(x_ii*255).item():.0f}" for x_ii in x_i.permute(1,2,0).flatten()])

Cheers, Mark

dgl-prc commented 3 years ago

Thank you for the detailed response.

Initially, I did the test with PyTorch model. The command was:

python . --netname ../data/cifar10/convMedGRELU__Point.pyt --dataset cifar10 --domain deeppoly --spatial --t-norm inf --delta 0.3

I just did the test with onnx model again, but still can not get the correct results. The command used was:

python . --netname ../data/cifar10/onnx/convSmallRELU__Point.onnx --dataset cifar10 --domain deeppoly --epsilon 0 --num_test 10000

There is a minor in the code about the normalization. Here are the details.

  1. When building the ERAN model in the following code, the config.mean and config.std will be assigned a constant value:

https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/__main__.py#L489

https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/onnx_translator.py#L441-L449

Thus, the variable means will be a constant (0.1307).

https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/__main__.py#L507-L509

  1. However, the constant value of means will incur an exception when normalizing:

https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/__main__.py#L119-L128

The exception:

  File "./__main__.py", line 1338, in <module>
    normalize(specLB, means, stds, dataset)
  File "./__main__.py", line 125, in normalize
    tmp[count] = (image[count] - means[1])/stds[1]
IndexError: index 1 is out of bounds for axis 0 with size 1

To fix this bug, I commented out the following code:

https://github.com/eth-sri/eran/blob/f26f857abc7d0ec215da9f589866afea971326c8/tf_verify/__main__.py#L507-L509

However, the accuracy was still about 15%. To avoid using the corruption of cifar10 test set, I generated the new test set with the following code from you kindly provided, but the accuracy was even worse, about 10%.

cfiar_ds = torchvision.datasets.CIFAR10(root="./cifar10", train=False, download=True, transform=torchvision.transforms.ToTensor())
with open('cifar10_test_full.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
    for img in cfiar_ds:
        x_i, y_i = img[0], img[1]
        csvwriter.writerow([y_i] + [f"{torch.round(x_ii*255).item():.0f}" for x_ii in x_i.flatten()])

PS:

mnmueller commented 3 years ago

Hello @dgl-prc,

For both networks you referenced constant = self.add_resources(node)[0].reshape(-1) evaluates to a 1d np.array with 3 entries for both mean and std, when I evaluate them. (in fact the reshape command will even turn a 0d array into a 1d one). This is important to extract the correct normalization parameters expected by your network. Simply commenting out lines 507 to 509 you will lead to default parameters being used, which might be different from the ones expected by the network, most likely causing the compromised performance you observe.

0.1307 is a typical mean for the MNIST dataset, while both networks you referenced above are CIFAR10. Can you validate, that your local code is indeed identical with the current version on master branch (ideally clone the repo again).

I noticed a permutation was missing in the script I posted above to generate the .csv set (updated it now) sorry about that.

I just downloaded the network files again and retested them and I get reasonable accuracies and no exceptions.

Cheers, Mark

dgl-prc commented 3 years ago

Thank you so much! I just cloned the repo again and generated the test set with the new script. Then I retested two models and the results are reasonable and no exceptions are raised.