Open CodeCrusader66 opened 1 year ago
I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training
Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good
the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3).
we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭)
the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head
thank you for your reply!!
I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training
Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good
the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3).
we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭)
the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head
thank you for your reply!!
I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training
Hello, may I ask how long did it take to get perception loss to convergence (0.2-0.3)? Though I was stuck in 2~3 using my own dataset, my inference result is not bad at all.
inference result not bad? if your perception loss didn't converge , the perception loss is the only effective loss, because is the largest loss . so the result can not be good
I think you print the wrong perception loss(multiple 10?). please check it.
Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3). we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭) the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head thank you for your reply!!
I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training
Hello, may I ask how long did it take to get perception loss to convergence (0.2-0.3)? Though I was stuck in 2~3 using my own dataset, my inference result is not bad at all.
in frame training perception loss is around 2-3 in clip training perception loss is around 0.2-0.3
Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3). we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭) the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head thank you for your reply!!
I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training
Hello, may I ask how long did it take to get perception loss to convergence (0.2-0.3)? Though I was stuck in 2~3 using my own dataset, my inference result is not bad at all.
My perception loss also stuck at about 1.5 even in clip training phase, I think it's the dataset issue
May I ask if multi-stage training is necessary for DINet, or if it is possible to only train the final stage to save training time? I understand that multi-stage training is primarily used to improve the effect of data initialization, so theoretically, it should be possible to train only the final stage