MRzzm / DINet

The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."
996 stars 175 forks source link

multi-stage training is necessary? #48

Open CodeCrusader66 opened 1 year ago

CodeCrusader66 commented 1 year ago

May I ask if multi-stage training is necessary for DINet, or if it is possible to only train the final stage to save training time? I understand that multi-stage training is primarily used to improve the effect of data initialization, so theoretically, it should be possible to train only the final stage

Inferencer commented 1 year ago

I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training

CodeCrusader66 commented 1 year ago

Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good

the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3).

we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭)

the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head

thank you for your reply!!

I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training

1059692261 commented 1 year ago

Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good

the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3).

we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭)

the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head

thank you for your reply!!

I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training

Hello, may I ask how long did it take to get perception loss to convergence (0.2-0.3)? Though I was stuck in 2~3 using my own dataset, my inference result is not bad at all.

CodeCrusader66 commented 1 year ago

inference result not bad? if your perception loss didn't converge , the perception loss is the only effective loss, because is the largest loss . so the result can not be good

I think you print the wrong perception loss(multiple 10?). please check it.

CodeCrusader66 commented 1 year ago

Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3). we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭) the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head thank you for your reply!!

I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training

Hello, may I ask how long did it take to get perception loss to convergence (0.2-0.3)? Though I was stuck in 2~3 using my own dataset, my inference result is not bad at all.

in frame training perception loss is around 2-3 in clip training perception loss is around 0.2-0.3

huangxin168 commented 1 year ago

Good morning!I have try to Go straight to train step four ,and skip the first three, the perception loss keep in 2~3 . it is not good the first step on single frame is necessary , It can help the perception loss to convergence (0.2 ~0.3). we think the common key point of talking head is the sync loss and the design of reference image . we also trying to find a good way to Solves lip sync in extreme cases. but it is hard to process my HDTF as good as yours. (I'm a math major 😭) the sync loss is hard to convergence, we have design a new sync loss in 3d talking head , and trying to do the similar thing in 2d talking head thank you for your reply!!

I have also been thinking about this, perhaps without editing the code you could train the first few stages to only 1 epoch each (due to the requirement to use a previous stages model), then train the final stage + clip training and find how if it works as I am unfamiliar with the full inner workings of training

Hello, may I ask how long did it take to get perception loss to convergence (0.2-0.3)? Though I was stuck in 2~3 using my own dataset, my inference result is not bad at all.

My perception loss also stuck at about 1.5 even in clip training phase, I think it's the dataset issue