Open ymchen7 opened 3 months ago
It looks like there is nothing wrong with your visualization results and loss. The loss is calculated on the concat latents. The training details should have been explained in the paper, and no other special tricks in loss part.
Is it trainable?Could you please tell me how to train this model?
@ymchen7 @Zheng-Chong I trained on both only loss for person latents and loss for concat latents, and found that the results are similar. Because the garment is provided as a condition, it is very easy for the model to learn from it. The part that really makes a difference is the loss of the person latents.
@ymchen7 @Zheng-Chong I trained on both only loss for person latents and loss for concat latents, and found that the results are similar. Because the garment is provided as a condition, it is very easy for the model to learn from it. The part that really makes a difference is the loss of the person latents.
That makes sense.
@ymchen7 Hello sir can you tell me how I can train this model on my machine
@ymchen7 how did you create the training pipeline? considering no details are given in the code.
@ymchen7 I also want to know, can you please guide us how can we train this model on our machine?
Good work for the design of such simple vton pipeline.
I have tried to train CatVTON on vitonhd dataset, but the result is a little blurry as shown below. (38k iteration batchsize 8x32 512x384 resolution input, only attention parameters are trained)
I'm wondering is there any specific setting or trick in the loss part, for example how to compute the loss? (i.e. compute loss of latents of human images or the concat latents. )
I also noticed the training loss is relatively small at the beginning of the training, is this normal?
Epoch 0, step 0, step_loss: 0.06322, data_time: 2.104, time: 4.421 Epoch 0, step 1, step_loss: 0.04681, data_time: 0.058, time: 2.126 Epoch 0, step 2, step_loss: 0.06814, data_time: 0.058, time: 2.124 Epoch 0, step 3, step_loss: 0.03120, data_time: 0.064, time: 2.139 Epoch 0, step 4, step_loss: 0.02966, data_time: 0.059, time: 2.132 Epoch 0, step 5, step_loss: 0.03977, data_time: 0.059, time: 2.132 Epoch 0, step 6, step_loss: 0.05645, data_time: 0.059, time: 2.133
i have also the same issue, the color is not correct for many cloth, such the light red -> deep red. do you find some methods to solve it?
Good work for the design of such simple vton pipeline.
I have tried to train CatVTON on vitonhd dataset, but the result is a little blurry as shown below. (38k iteration batchsize 8x32 512x384 resolution input, only attention parameters are trained)
I'm wondering is there any specific setting or trick in the loss part, for example how to compute the loss? (i.e. compute loss of latents of human images or the concat latents. )
I also noticed the training loss is relatively small at the beginning of the training, is this normal?
Epoch 0, step 0, step_loss: 0.06322, data_time: 2.104, time: 4.421 Epoch 0, step 1, step_loss: 0.04681, data_time: 0.058, time: 2.126 Epoch 0, step 2, step_loss: 0.06814, data_time: 0.058, time: 2.124 Epoch 0, step 3, step_loss: 0.03120, data_time: 0.064, time: 2.139 Epoch 0, step 4, step_loss: 0.02966, data_time: 0.059, time: 2.132 Epoch 0, step 5, step_loss: 0.03977, data_time: 0.059, time: 2.132 Epoch 0, step 6, step_loss: 0.05645, data_time: 0.059, time: 2.133