eth-sri / eran

ETH Robustness Analyzer for Deep Neural Networks
Apache License 2.0
323 stars 103 forks source link

ACAS Xu Network 5,3 + Property 2: ERAN prints invalid counterexample. #88

Closed cherrywoods closed 3 years ago

cherrywoods commented 3 years ago

Dear ERAN developers, When I try to verify whether the ACAS Xu network 5,3 satisfies property 2 using

python3 . --dataset acasxu --domain deeppoly --complete True --netname ../data/acasxu/nets/ACASXU_run2a_5_3_batch_2000.onnx --spec 2

ERAN concludes that property 2 is violated and also prints a counterexample:

MILP adex model status: 2, Model solution count: 1, Final solve time: 0.354
property violated at  [0.6798577687, -0.006991718370542967, -0.07116168099995945, 0.4617607250888071, -0.47536072704630505] output_score [0.022617856469960455, 0.022617856469996443, -0.018460240917913703, 0.022170666419433646, -0.016787727236396707]
 46%|███████████████████████████▉                                | 46/99 [01:55<17:49, 20.17s/it]
MILP model status: 2, Obj val/bound for constraint (1, 0, 0): 0.0095/0.0095, Final solve time: 0.255
100%|██████████████████████████████████████████████████████| 99/99 [216:01:35<00:00, 7855.51s/it]
AcasXu property 2 Verified unsound (with adex) for Box 0 out of 1
777695.4908115864 seconds
Total time needed: 777695.4908769131 seconds

However, when looking closer at the printed counterexample, this does not appear to actually violate property 2:

output score 0:     0.0 2 2 6 1 7 8 5 6 4 6 9 9 6 0 4 5 5
output score 1:     0.0 2 2 6 1 7 8 5 6 4 6 9 9 9 6 4 4 3
digits match?       = = = = = = = = = = = = = = <

output 1 is actually larger than output 0, so property 2 is fulfilled since output 0 can not be maximal. However, the mismatch is only in the 14th decimal place. Is this just related to printing or is it actually a spurious result from Gurobi?

Best regards, David

cherrywoods commented 3 years ago

I forgot to say: I ran this on a Ubuntu 18.04 virtual machine with the latest version of ERAN and it's dependencies.

mnmueller commented 3 years ago

Hello @cherrywoods,

Thanks for pointing this out. This probably has to do with how we evaluate samples to check, whether an actual counterexample was found: We use the same floating-point-sound evaluation as for certification, but pass the to be evaluated sample as both the upper- and the lower-bound of the input. If then, using this floating-point-safe evaluation, the constraints do not hold, we say we found a counterexample. We then print the lower bound (which is identical to the upper bound except for floating-point effects) on the network output. However, here it might be more appropriate to check if the negation of the constraints hold under floating-point-sound evaluation.

I will look into if this is actually the cause and then provide an update.

Cheers, Mark

mnmueller commented 3 years ago

I updated the check for counterexamples, which should have resolved this issue. Please let me know if you still observe this behaviour with your settings/hardware.

Cheers, Mark

cherrywoods commented 3 years ago

Hello Mark,

The change did also fix this on my hardware. Thank you very much!