Closed CNOCycle closed 3 years ago
Hi,
this is an interesting point. I agree that using half precision would significantly decrease the runtime. However, I have some concerns about it: first, the adversarial perturbations computed in half precision with methods which produce points close to the decision boundary (e.g. FAB) might not lead to misclassification when the model is used with single precision (assuming that the model is trained with that), and this would require particular care in the implementation of the attack. Second, I'm aware of cases where using attacks with half precision leads to higher robust accuracy, like in the logs you reported: while in this case the discrepancy is not too large, for other models it might be worse.
I think having attacks in half precision could be helpful to test models in scenarios where the input is restricted to be in FP16 (this can be currently done in a more expensive way by casting the input of the model to the desired format at the beginning of the forward pass), but I'm not sure this is currently a popular setup.
Thank for your response. I totally agree your concernment. In TF2, the official document suggests that the model output before the final softmax
layer should be convert to FP32 to ensure that the normalized probability can be represented in FP32 format. But I'm not sure whether gradient computation during backward phase is computed in FP32 or FP16 format. I also noticed that accuracy of FP16 version of AA is higher than that of FP32 version about 0.05% in average. This gap is acceptable for me but I'll investigate this tiny gap example by example. I agree that FP16 version of AA is not ready for AA until the gap is fully eliminated.
According to Pytorch document [1], mixed precision is restricted in region with autocast()
and the official suggests that
autocast should wrap only the forward pass(es) of your network, including the loss computation(s). Backward passes under autocast are not recommended. Backward ops run in the same type that autocast used for corresponding forward ops.
Therefore, we can ensure that backward ops are computed in FP32. The only one concern is whether the forward phase is precise enough.
After my investigation, the robust accuracy of FP16 version is still higher than that of FP32 version about 0.05% in average. So I concluded that AA is not ready for mixed precision.
Hi,
thanks for the update! I agree that for the moment using mixed precision might be problematic.
Hi contributors,
Will auto-attack support FP16 (or mixed precision)[1] in pytorch?
In TF2, FP16 is configured at the beginning of main function with one flag
tf.keras.mixed_precision.set_global_policy('mixed_float16')
The benefit of FP16 is decreasing elapsed time significantly without losing attacking algorithm's performance.
The following is the output logging of my experimental implementation on V100:
As shown in logging, the performance is improved very huge (1835.3[s] -> 751.5[s]).
However, FP16 required newer pytorch version and CUDA hardware. Additional, source code should be modified properly.
I'm not sure whether FP16 will be supported on
master
branch in the future?[1] https://pytorch.org/docs/stable/notes/amp_examples.html