Question about the encoder's architecture

Haian-Jin / Neural_Gaffer

Official code for "Neural Gaffer: Relighting Any Object via Diffusion"

154 stars 2 forks source link

Question about the encoder's architecture #2

Open pureexe opened 4 days ago

pureexe commented 4 days ago

Thank you for you great work! But I'm missing a detail on the encoder. Could you explain it more?

What is the network architecture that is used to encode input images, LDR maps, and HDR maps?
Is this encoder some off-the-self network (maybe CLIP?)
Does the encoder fine-tune (weight update) during training?
Do these 3 encoders have different weights, or do they share the same weight?

bring728 commented 3 days ago

I'm curious about the same thing. The size of the RGB image and the environment map will be different, and the nature of the data will also likely be different. It seems that the paper used a pretrained encoder of stable diffusion. Did the pretrained encoder learned with RGB images (LAION dataset) work well in the environment map?

Haian-Jin commented 2 days ago

Thank you for you great work! But I'm missing a detail on the encoder. Could you explain it more?

What is the network architecture that is used to encode input images, LDR maps, and HDR maps?

Is this encoder some off-the-self network (maybe CLIP?)

Does the encoder fine-tune (weight update) during training?

Do these 3 encoders have different weights, or do they share the same weight?

All of your questions have been discussed in the paper. Please refer to the paper.

Haian-Jin commented 2 days ago

I'm curious about the same thing. The size of the RGB image and the environment map will be different, and the nature of the data will also likely be different. It seems that the paper used a pretrained encoder of stable diffusion. Did the pretrained encoder learned with RGB images (LAION dataset) work well in the environment map?

The environment maps are sized to be the same size as the input images. Our experiments showed that the pre-trained encoder could work well in the preprocessed environment maps we used as the input.

pureexe commented 2 days ago

Thank you for you great work! But I'm missing a detail on the encoder. Could you explain it more?

What is the network architecture that is used to encode input images, LDR maps, and HDR maps?

Is this encoder some off-the-self network (maybe CLIP?)

Does the encoder fine-tune (weight update) during training?

Do these 3 encoders have different weights, or do they share the same weight?

All of your questions have been discussed in the paper. Please refer to the paper.

This is the only section in the paper that mentions the encoder, and I don't see how it elaborates the encoder design. Do i miss something?