fahadshamshad / Clip2Protect

[CVPR 2023] Official repository of paper titled "CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search".
https://fahadshamshad.github.io/Clip2Protect/
97 stars 11 forks source link

Some questions about the calculation of protection success rate(PSR) and dodging attack. #8

Open cjyryc opened 11 months ago

cjyryc commented 11 months ago

Your work is excellent, but I have a few questions:

  1. In the adversarial loss for dodging attacks, why is the cosine distance between the generated identity and the target identity used as the first term of the adversarial loss? As far as I know, dodging attacks should not require a target identity to guide, that is, it is not necessary to minimize the cosine distance between the generated identity and the target identity (the first term of the adversarial loss). It is only necessary for the distance between the generated identity and the original identity to be large enough (the second term of the adversarial loss).
  2. When calculating the protection success rate(PSR), you calculate the cosine similarity between the generated portrait (i.e., the protected portrait) and the target portrait identity in the function "black_box". Then, in the function "quan", you consider the cosine similarity is greater than the system threshold τ as a successful attack. My understanding is that the cosine similarity should be greater than (1-τ) , this is a successful protection, or the cosine distance should be less than the τ.
  3. If the first point is correct, that is, dodging attacks involves two the adversarial losses, then in order to ensure that the protected image can be identified as the target identity, the cosine distance between the protected image and the target identity should be less than the cosine distance between the protected image and the original identity. That is, the optimization should be terminated when the adversarial loss is less than 0. However, you terminated the optimization after only 50 iterations, and I think it's too early to terminate and the protection effect seems not very good. I would be very grateful if you could reply to me.
fahadshamshad commented 11 months ago

Thanks for your questions.

1 - You're right that for dodging attacks, the focus is on maximizing the distance between the original and generated identities. Our method, following "Towards Face Encryption by Generating Adversarial Identity Masks (ICCV21)," is flexible for both impersonation and dodging, but you can set the first term to zero if your goal is solely dodging. 2- In our approach, PSR is viewed from the user's perspective, where the aim is to successfully impersonate a different identity, as outlined in "Towards Face Encryption by Generating Adversarial Identity Masks." So, a higher cosine similarity between the generated (protected) portrait and the target identity, surpassing the system threshold τ, is interpreted as high PSR. 3. We have observed that 50 iterations are generally sufficient for this purpose, especially considering that we start from a latent code corresponding to the original image rather than generating from scratch. This method aligns with the one used in the ICCV21 paper, where a similar optimization was effective for a noise-based attack in 20 iterations. Our results show reasonable performance within these constraints.

please let us know, if anything is not clear.

cjyryc commented 11 months ago

I am very grateful for your quick and detailed response. However, I still have some questions:

  1. In question 2, I still have some confusion. According to your paper, when the cosine distance between the protected identity and the target identity is less than the τ value, they can be considered as the same identity and deemed as protected successfully (if I haven't understood it wrongly). As you wrote in your paper: D

    "if cosine distance ≤ τ, they are the same identity."

    D1

However, the cosine distance and cosine similarity add up to 1, which means that if we want to protect successfully (with high PSR), the cosine similarity should be greater than (1-τ).

  1. In question 3, 50 iterations can achieve a pretty good generation effect, but when testing the protection rate, we found that the cosine similarity between the protected image and the original image is still greater than the cosine similarity between the protected image and the target identity. I think in the face verification scenario, there is still a risk of privacy leakage.

If I have understood something wrong, please tell me. Thank you!