Open Buicongbang04 opened 1 month ago
Hello, thank you for your interest in our work.
We have updated our code and added more details. After downloading the required checkpoints into the 'checkpoints' directory, you can use the following commands for inference:
cd multi_target_lip2speech && bash scripts/lrs3/inference.sh && cd ..
cd multi_input_vocoder && bash scripts/lrs3/inference.sh && cd ..
or
cd multi_target_lip2speech && bash scripts/lrs3/inference_avhubert.sh && cd ..
cd multi_input_vocoder && bash scripts/lrs3/inference_aug.sh && cd ..
The results can be found in the 'results/lrs3' directory. If you encounter any issues, please let us know where the error occurred.
@choijeongsoo Thanks for your great work here. Sorry, but I have a question,, can you show me how to preprocess my custom data to run with your model.
I'm sorry for late reply.
For inference, you need a lip region video and a speaker embedding from a sample speech.
model_speaker_encoder.py
and encoder.pt
in https://github.com/choijeongsoo/av2av/tree/main/unit2av.We plan to provide a complete pipeline for generating output from random videos and sample speech, but I think it will be a bit difficult for the time being.
Do you mean speaker embedding looks like this? I followed an avhubert repo and this repo generated for me a lip video and extracted features from that video, in tensor form as shown in the picture. Is this the content of the .unt file or not, and what is the meaning of the dict.unt.txt file? Hope to receive your response soon. Thanks!
The speaker embedding vector looks like [1 x 256] for one utterance. If I remember correctly, it is processed through a ReLU activation and then l2 normalized after pooled to be a single vector.
.unt file for speech unit is similar to .wrd file for subword in the avhubert repo. we used dictionary size of 200 and dict.unt.txt file will contain 200 lines that represent each speech unit (0, 1, ..., 199)
Hi sir, can you descibe more specific the step to run this repo. I followed your instruction but it did not work. Do I need to fix the path in file config and .rh ? And do I need to install data and checkpoints?
Hope you answer this soon.