alievk / npbg

Neural Point-Based Graphics
MIT License
324 stars 51 forks source link

Feasibility of alternative cost functions ? #9

Closed yyeboah closed 3 years ago

yyeboah commented 3 years ago

First of all this work is impressive. Thanks for sharing the codes with the wider community.

My question pertains mainly to the cost function used for converging the model. While the perceptual cost function alone has proven to be effective in achieving good results(both in the paper and in my own validation experiments), I wonder if the feasibility of alternative cost functions such as the L1 has been investigated as well ? Or perhaps some combination scheme of multiple cost functions? What is the core motivation behind training with the VGG cost alone ?

Apologies if this has been answered elsewhere but I found no pertinent discussions in the paper or anywhere in the existing issues.

seva100 commented 3 years ago

@yyeboah, thank you for your interest in our work!

The use of the VGG loss, as well as other perceptual losses, was motivated as a replacement of L1 which encourages the predicted image to be perceptually close to the ground truth image. In contrast to VGG, L1 only enables the convergence of the low-frequency component, which results in a blurry image. We've also tried a sum of VGG and L1 but have not noticed any apparent change. Perhaps, the quality can be increased if the perceptual loss is combined with other losses, such as adversarial, as suggested in many related works. Though, the use of GAN-based loss can be tricky, as the discriminator must see a large enough dataset of real images to avoid overfitting. We would appreciate if the community enhances our results by introducing loss functions that result in sharper renderings and fewer artifacts.

In this regard, the concurrent work of Huang et al. 2020 investigates the application of the adversarial loss for a highly related task of texture mapping (fitting a color texture for a mesh reconstructed from the set of photographs). In fact, VGG baseline from that work is similar to our Texture+Mesh baseline in the paper text (though, their VGG results suffer from strange out-of-range artefacts which we don't have).

yyeboah commented 3 years ago

@seva100 , Your prompt and detailed explanation is very much appreciated.

Indeed the L1 and its counterpart L2 loss have both been consistently proven to be ill-suited for image generation tasks, specifically w.r.t. capturing and encouraging the high-frequency components. I also agree that a GAN-style loss, as you have suggested may be better suited for encouraging some additional crispness in the renderings. This claim has been further backed by the results reported by Huang et al. 2020.

I'll be closing this issue for now, with the hope of resuming discussions at a later point in time when I've had some luck with figuring out a suitable discriminator.