SamsungLabs / DINAR

Inference code for "DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars"
Other
98 stars 6 forks source link

How to train DINAR ? #12

Open mayank64ce opened 8 months ago

mayank64ce commented 7 months ago

What parts are used as pretrained, finetuned and trained from scratch ?

david-svitov commented 7 months ago

Little secret: you need to train the renderer and generator on this data https://dolorousrtur.github.io/datasets/youted/index.html Then train inpainting on Texel or any similar data as described in the article.

mayank64ce commented 7 months ago

I see. I am trying to write the training code for DINAR. And I traced back the functions called during the inference (which I assume should be quite similar to the training pipeline). I found this:

1. Encoder (style_gan_v2.py)
2. Generator (style_gan_v2.py)
3. Rasterizer (uv_rasterizer.py)
4. ColorsRasterizer (color_rasterizer.py)
5. VQModel (vqmodel.py)
6. Encoder (diffusion_model.py) input torch.Size([1, 21, 256, 256])
7. UnetModel (openaimodel.py) input [1,6,64,64] * 200
8. VectorQuantizer2 (quantize.py) input [1,6,64,64]
9. Decoder (diffusion_model.py) input [1,6,64,64]
10. Rasterizer (uv_rasterizer.py) this time with diffusion_ntexture
11. Renderer (pix2pix.py) with uv_mask + uv

I assumed that I need to calculate the losses now but then the code goes to the avatar_tune.py file and then calculates the losses. I don't clearly understand that part or where its written in the paper.

Also I see the file : first_stage.py but its never used. From the comments, it looks like the first stage code refered in the section 3 of the paper. If it is what I think it is, how can I use it to train DINAR on a custom dataset ?

david-svitov commented 7 months ago

avatar_tune.py is only needed for better color matching at the very last stage. Don't pay any attention to him for now. Start by training Encoder+Generator+Renderer as in Fig. 2. They need to be trained end-to-end on the dataset that I posted above. Or similar. When you manage to train a model that renders avatars from a known view, you can move on to inpainting.

I would start by implementing the forward pass for rendering and checking that the weights from the checkpoint are loaded into it and processed correctly. Without an inpainting block yet. Then you can try to start training from the checkpoint and see that the model does not diverge.

mayank64ce commented 7 months ago

You mean run the first_stage.py file with the checkpoint weights and remove the compress_branch right ? @david-svitov