Open Yuki-yt3 opened 4 months ago
This function is the selection of the target speaker, and we refer to the idea of the least likely class in adversarial attacks for implementation.
This function is not the core code of this paper, please implement by yourself.
Thanks for your code, I'd like to ask some details of the experiement:
(1)First, how to start the project, what parameters and files should I consider iif I'd use your data_utils.py
and attack_utils.py
to implement the attack.py
like Repostories in AttackVC. For instance, what is the parameter "init_c" look like in the function mask_wav_emb_attack , is it the alpha parameter in Algorithm 1?
(2)Second, I‘d also like to ask the details about from data_utils import wav2mel_tensor, Transform
. As the code described, I guess the function Transform
is related to the Power Spectral Density because of the output parameter psd_transformer
, is the implement similar to the codes in generate_masking_threshold.py
? Besides, what are the detailed components of **kwargs
? Given that psd_transformer serves as the input for the function deal_mask_wav_emb_attack_1
, it is necessary to understand what kwargs entails.
(3)Last but not least, Could you please explain the details of the Binary Search algorithm, as indicated in line 89 of Algorithm 1? It appears that the function deal_mask_wav_emb_attack_2
implements weighted loss instead of masking threhold loss in equation 7. However, I am not entirely clear on how the parameter 'c' (may be referred to as 'alpha' in the paper) updated in the binary search. Additionally, I'd also like to inquire about the meanings of the variables attack_flag
and false_c
respectively, is it consider about the scenarios considering no attacks or no masking_threhold?
Many thanks for the open-sourcing code again, and looking forward to your reply!
(1)
init_c = 1.0
,c
is alpha; (2) The implementation of the Transformer class has been submitted;model, config, attr, device = load_model(model_dir)
,**kwargs=**config["preprocess"]
; (3) Please refer to Equation 8; You can implement the alphas however you like. It only affects efficiency.attack_flag
represents the determination of whether the defense is successful or not.false_c
represents the value ofalpha
when the defense fails.
This algorithm is no longer maintained, please implement it as you wish.
Thanks to the code and parameters provided, I ran through the code based on the following steps:
(1) Comment the lines of code from asr_model.find_most_unlikely_speaker import speaker_name
, and instead predefining my list variable speaker_name
which contains multiple speakers and a dictionary path variable speaker_embedding_dict
consists of these speakers and their representations;
(2) On the basis of default parameters in AttackVC and kwargs = config["preprocess"]
you offered, I got three key variables wav
, theta_xs
, psd_max
of the origin input wave vc_tgt
offering the speaker information;
(3) After changing these key variables into Tensor,I used the function mask_wav_emb_attack
and got the output final_adv_inp
, which is a Tensor shaped just like the wav
input;
After these steps, however, I turned final_adv_inp
into numpy, and got a noised wav by using soundfile
library, no matter sample rate for 16kHz or 24kHz in configs provided in AttackVC. I think it's different from AttackVC
cause the adv_inp
of that algorithm sounds like the ori_input
wave. Therefore, I'd like to ask for a help based on two hypothesis:
(1) loss in Logs as follows, and I'm afraid the loss_emb_l2
is too high
step 1 : 1500 || loss_emb_l2 : 5.346188545227051 || loss_th : 39.75281083223775;
attack_step_1 || attack_flag : True || eps : 0.1 || step : 1500
step 2 : 700 || loss_emb_l2 : 4.801605701446533 || loss_th : 37.85937347082802
attack_step_2 || attack_flag : True || c : 8.0 || step : 750
attack_step || eps : 0.1 attack_step_2 || c : 8.0
(2) I've also seen class InversePreEmphasis
in https://github.com/LJY-M/Voice-Guard/blob/main/data_utils.py#L37, is there any need to inversePreEmphasis adv_inp
cause parameter vc_tgt
is preemped in https://github.com/LJY-M/Voice-Guard/blob/main/attack_utils.py#L46, though no influences to the vc_tgt input in deal_mask_wav_emb_attack_1
and deal_mask_wav_emb_attack_2
Looking foward to the reply, many thanks!
Thank you very much for sharing the code for this work! However, in the attack_utils.py
from data_utils import wav2mel_tensor, Transform
, I meet the error that cannot find referenceTransform
in data_utils.py andfrom asr_model.find_most_unlikely_speaker import speaker_name
unresolve reference 'asr_model'. Would you please provide more code details?Thanks you very much!