fra31 / auto-attack

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
https://arxiv.org/abs/2003.01690
MIT License
656 stars 112 forks source link

Support FP16 in pytorch #51

Closed CNOCycle closed 3 years ago

CNOCycle commented 3 years ago

Hi contributors,

Will auto-attack support FP16 (or mixed precision)[1] in pytorch?

In TF2, FP16 is configured at the beginning of main function with one flag tf.keras.mixed_precision.set_global_policy('mixed_float16')

The benefit of FP16 is decreasing elapsed time significantly without losing attacking algorithm's performance.

The following is the output logging of my experimental implementation on V100:

# FP32 version
apgd-ce - 1/17 - 159 out of 500 successfully perturbed
apgd-ce - 2/17 - 146 out of 500 successfully perturbed
apgd-ce - 3/17 - 154 out of 500 successfully perturbed
apgd-ce - 4/17 - 142 out of 500 successfully perturbed
apgd-ce - 5/17 - 155 out of 500 successfully perturbed
apgd-ce - 6/17 - 156 out of 500 successfully perturbed
apgd-ce - 7/17 - 157 out of 500 successfully perturbed
apgd-ce - 8/17 - 148 out of 500 successfully perturbed
apgd-ce - 9/17 - 153 out of 500 successfully perturbed
apgd-ce - 10/17 - 161 out of 500 successfully perturbed
apgd-ce - 11/17 - 166 out of 500 successfully perturbed
apgd-ce - 12/17 - 141 out of 500 successfully perturbed
apgd-ce - 13/17 - 158 out of 500 successfully perturbed
apgd-ce - 14/17 - 152 out of 500 successfully perturbed
apgd-ce - 15/17 - 155 out of 500 successfully perturbed
apgd-ce - 16/17 - 151 out of 500 successfully perturbed
apgd-ce - 17/17 - 125 out of 412 successfully perturbed
robust accuracy after APGD-CE: 58.33% (total time 1835.3 s)
apgd-t - 1/12 - 27 out of 500 successfully perturbed
apgd-t - 2/12 - 24 out of 500 successfully perturbed
apgd-t - 3/12 - 23 out of 500 successfully perturbed
apgd-t - 4/12 - 18 out of 500 successfully perturbed
apgd-t - 5/12 - 23 out of 500 successfully perturbed
apgd-t - 6/12 - 16 out of 500 successfully perturbed
apgd-t - 7/12 - 24 out of 500 successfully perturbed
apgd-t - 8/12 - 23 out of 500 successfully perturbed
apgd-t - 9/12 - 28 out of 500 successfully perturbed
apgd-t - 10/12 - 22 out of 500 successfully perturbed
apgd-t - 11/12 - 27 out of 500 successfully perturbed
apgd-t - 12/12 - 22 out of 333 successfully perturbed
robust accuracy after APGD-T: 55.56% (total time 12733.1 s)
# FP16 version
apgd-ce - 1/17 - 159 out of 500 successfully perturbed
apgd-ce - 2/17 - 147 out of 500 successfully perturbed
apgd-ce - 3/17 - 154 out of 500 successfully perturbed
apgd-ce - 4/17 - 141 out of 500 successfully perturbed
apgd-ce - 5/17 - 155 out of 500 successfully perturbed
apgd-ce - 6/17 - 156 out of 500 successfully perturbed
apgd-ce - 7/17 - 158 out of 500 successfully perturbed
apgd-ce - 8/17 - 147 out of 500 successfully perturbed
apgd-ce - 9/17 - 156 out of 500 successfully perturbed
apgd-ce - 10/17 - 160 out of 500 successfully perturbed
apgd-ce - 11/17 - 164 out of 500 successfully perturbed
apgd-ce - 12/17 - 139 out of 500 successfully perturbed
apgd-ce - 13/17 - 158 out of 500 successfully perturbed
apgd-ce - 14/17 - 152 out of 500 successfully perturbed
apgd-ce - 15/17 - 155 out of 500 successfully perturbed
apgd-ce - 16/17 - 151 out of 500 successfully perturbed
apgd-ce - 17/17 - 126 out of 412 successfully perturbed
robust accuracy after APGD-CE: 58.34% (total time 751.5 s)
apgd-t - 1/12 - 28 out of 500 successfully perturbed
apgd-t - 2/12 - 22 out of 500 successfully perturbed
apgd-t - 3/12 - 24 out of 500 successfully perturbed
apgd-t - 4/12 - 16 out of 500 successfully perturbed
apgd-t - 5/12 - 20 out of 500 successfully perturbed
apgd-t - 6/12 - 15 out of 500 successfully perturbed
apgd-t - 7/12 - 23 out of 500 successfully perturbed
apgd-t - 8/12 - 25 out of 500 successfully perturbed
apgd-t - 9/12 - 29 out of 500 successfully perturbed
apgd-t - 10/12 - 21 out of 500 successfully perturbed
apgd-t - 11/12 - 26 out of 500 successfully perturbed
apgd-t - 12/12 - 21 out of 334 successfully perturbed
robust accuracy after APGD-T: 55.64% (total time 5264.7 s)

As shown in logging, the performance is improved very huge (1835.3[s] -> 751.5[s]).

However, FP16 required newer pytorch version and CUDA hardware. Additional, source code should be modified properly.

I'm not sure whether FP16 will be supported on master branch in the future?

[1] https://pytorch.org/docs/stable/notes/amp_examples.html

fra31 commented 3 years ago

Hi,

this is an interesting point. I agree that using half precision would significantly decrease the runtime. However, I have some concerns about it: first, the adversarial perturbations computed in half precision with methods which produce points close to the decision boundary (e.g. FAB) might not lead to misclassification when the model is used with single precision (assuming that the model is trained with that), and this would require particular care in the implementation of the attack. Second, I'm aware of cases where using attacks with half precision leads to higher robust accuracy, like in the logs you reported: while in this case the discrepancy is not too large, for other models it might be worse.

I think having attacks in half precision could be helpful to test models in scenarios where the input is restricted to be in FP16 (this can be currently done in a more expensive way by casting the input of the model to the desired format at the beginning of the forward pass), but I'm not sure this is currently a popular setup.

CNOCycle commented 3 years ago

Thank for your response. I totally agree your concernment. In TF2, the official document suggests that the model output before the final softmax layer should be convert to FP32 to ensure that the normalized probability can be represented in FP32 format. But I'm not sure whether gradient computation during backward phase is computed in FP32 or FP16 format. I also noticed that accuracy of FP16 version of AA is higher than that of FP32 version about 0.05% in average. This gap is acceptable for me but I'll investigate this tiny gap example by example. I agree that FP16 version of AA is not ready for AA until the gap is fully eliminated.

CNOCycle commented 3 years ago

According to Pytorch document [1], mixed precision is restricted in region with autocast() and the official suggests that

autocast should wrap only the forward pass(es) of your network, including the loss computation(s). Backward passes under autocast are not recommended. Backward ops run in the same type that autocast used for corresponding forward ops.

Therefore, we can ensure that backward ops are computed in FP32. The only one concern is whether the forward phase is precise enough.

After my investigation, the robust accuracy of FP16 version is still higher than that of FP32 version about 0.05% in average. So I concluded that AA is not ready for mixed precision.

[1] https://pytorch.org/docs/stable/amp.html#autocasting

fra31 commented 3 years ago

Hi,

thanks for the update! I agree that for the moment using mixed precision might be problematic.