Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
104 stars 17 forks source link

How to finetune for specific person? #13

Closed davidmartinrius closed 1 year ago

davidmartinrius commented 1 year ago

Hello!

Please, could you explain how to fine tune a DINet checkpoint for a specific target? I know the process may be similar to training, but when finetuning for a small dataset I don't know how to achieve that. My dataset is composed by 5 videos and each video is only like 10-30 seconds. I don't know if this will be enough for a fine tuning with the provided checkpoint or I will need longer videos.

Also I noticed when you are in the step "6. Extracting deepspeech features from all audios and saving features..." you still use deepspeech. Is that right?

What would be the recipe to fine tune?

Thank you!

Elsaam2y commented 1 year ago

Hi,

No, 5 videos of 10-30s would be too less. In this case you can better add your videos to the dataset and retrain the model. You can try using the saved checkpoints and train only for the final fine stage to speed up the process.

Also I noticed when you are in the step "6. Extracting deepspeech features from all audios and saving features..." you still use deepspeech. Is that right?

Yes that's right and this is mainly to avoid retraining the model on different audio feature extractor to avoid losing the quality. During inference we use the wav2vec model and ma the extracted features to the expected ones of DeepSpeech. This aims to speed up the inference significantly.

Please let me know if you faced any issues. Thanks.

davidmartinrius commented 1 year ago

Ok, So by now I am going to retrain the model in the final stage with the HDTF dataset + my videos.

Thank you

9bitss commented 1 year ago

@davidmartinrius how did the fine-tuning go?

davidmartinrius commented 1 year ago

Hi @9bitss , simply didn't go. Until there is a clear explanation of how to train Syncnet I am not willing to do it. I have already seen quite a few people who say they have wasted many hours of their time training it without satisfactory results.

9bitss commented 1 year ago

Hi @davidmartinrius, Same here lots of money was spent on A100 GPU to train HDTF and my custom dataset. The result was not good. The only thing that seems promising is to train my own dataset with the latest checkpoint. It does a good job of reducing the inpainting issues. But this time lip movements are not as good as the original model.