ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
888 stars 153 forks source link

Inference after training on 1.5 minutes of custom video looks way worse than test clip #70

Open iboyles opened 1 year ago

iboyles commented 1 year ago

Do you think I should of made my video longer that I trained on or what I'm attaching results when using path from the torso training(which looked the best still bad) I trained a video that I talked in for 1 minute long. I also attached video I used to train but I muted audio to keep my data. The question is if it needs more training or a longer video or better video I'm not sure it is my first time training this model but I follow the steps for the 3 trainings in the instructions for epochs it recommended 99, 25 then 99 for torso. The second time training on a longer video it is only 37,46,37 epochs does anyone know how many it should take to train the model. How can I change the amount of epochs as well as why is it different on different runs ?

inference -

https://github.com/ashawkey/RAD-NeRF/assets/89850986/e15fdcc9-8f31-4db1-94ec-70903b98b234

training data -

https://github.com/ashawkey/RAD-NeRF/assets/89850986/738f2cf9-1e31-4573-8084-aece151895e5

iboyles commented 1 year ago

test results before adding audio- how do i get inference to look like results after torso training before adding audio , it seems better than actual

https://github.com/ashawkey/RAD-NeRF/assets/89850986/4f25c568-7f26-480e-8a56-c1caccc65720

inference