Inference after training on 1.5 minutes of custom video looks way worse than test clip

Do you think I should of made my video longer that I trained on or what I'm attaching results when using path from the torso training(which looked the best still bad) I trained a video that I talked in for 1 minute long. I also attached video I used to train but I muted audio to keep my data. The question is if it needs more training or a longer video or better video I'm not sure it is my first time training this model but I follow the steps for the 3 trainings in the instructions for epochs it recommended 99, 25 then 99 for torso. The second time training on a longer video it is only 37,46,37 epochs does anyone know how many it should take to train the model. How can I change the amount of epochs as well as why is it different on different runs ?

inference -

https://github.com/ashawkey/RAD-NeRF/assets/89850986/e15fdcc9-8f31-4db1-94ec-70903b98b234

training data -

https://github.com/ashawkey/RAD-NeRF/assets/89850986/738f2cf9-1e31-4573-8084-aece151895e5

ashawkey / RAD-NeRF

Inference after training on 1.5 minutes of custom video looks way worse than test clip #70