csjliang / LPTN

Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021
Apache License 2.0
413 stars 47 forks source link

implementation differs from paper ? #5

Closed iperov closed 3 years ago

iperov commented 3 years ago

cannot understand this block

def forward(self, x, pyr_original, fake_low):
        pyr_result = []
        mask = self.model(x)
        for i in range(self.num_high):
            mask = nn.functional.interpolate(mask, size=(pyr_original[-2-i].shape[2], pyr_original[-2-i].shape[3]))
            result_highfreq = torch.mul(pyr_original[-2-i], mask) + pyr_original[-2-i]
            ...

here you multiply mask from first high freq model with original high freq map

but then add original high freq map?

I don't see that in the paper

firefox_2021-05-21_20-04-24

csjliang commented 3 years ago

Hi, thanks for your question. As stated in the Readme, we have slightly modified the training process and get much higher performance. Adding the original high-freq can benefit the initialization of D by avoiding discriminating totally broken images without introducing parameters. The overall pipeline is unchanged where a progressive masking strategy saves much computation on high-freq yet combines deep information from the learning of masks. Removing the mentioned residual operation can get 22.7db at 480p.

GuideWsp commented 3 years ago
  1. First Difference: The paper use conv-lrelu-conv refine the mask, but the code don't refine the mask. so the three high branch can parallel compute. This here diff so much.
  2. Second Difference: as @iperov said;
  3. channel information is different, the channel of mask in paper is 1. did it a write error?
csjliang commented 3 years ago
  1. First Difference: The paper use conv-lrelu-conv refine the mask, but the code don't refine the mask. so the three high branch can parallel compute. This here diff so much.
  2. Second Difference: as @iperov said;
  3. channel information is different, the channel of mask in paper is 1. did it a write error?

Hi, sorry for confusing you. We have uploaded the LPTN_paper_arch.py and train_FiveK_paper.yml that directly implements the model illustrated in our paper. This model achieves 22.3db at 480p in this code. For your questions:

  1. We validated in our recent experiments that refining after multiplication can avoid artifacts. It does not affect the inference time cost, as one can validate it easily using the test_speed.py.
  2. As I have answered, the residual connection of high-freq can introduce a slight performance gain.
  3. Our recent experiment has found that mask channels being 3 brings better performance. As it does not affect the overall pipeline (can be seen as a hiper-para), we recommend the follow-up works to set mask channels being 3.