andreas128 / SRFlow

Official SRFlow training code: Super-Resolution using Normalizing Flow in PyTorch
Other
826 stars 112 forks source link

how to calculate loss #6

Closed LyWangPX closed 3 years ago

LyWangPX commented 3 years ago

Are we assumed to use final training output to calculate the loss as in eq (3)? In GLOW, (-loss / (log(2) * n_pixel)).mean() is implemented as well. It is not mentioned in SRFLOW, do we need similar processing steps?

neonbjb commented 3 years ago

The author exports the NLL loss here: https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96

I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

machlea commented 3 years ago

The author exports the NLL loss here: https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96

I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?

LyWangPX commented 3 years ago

The author exports the NLL loss here: https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96 I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?

I am not 100% sure about your questions. But if you sample a multi variable gaussian the overall probability will be the product of each element, which will be a sum in log. So they collapse.

neonbjb commented 3 years ago

I promised to update on training code, here you go: https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow

I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.

erqiangli commented 3 years ago

I promised to update on training code, here you go: https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow

I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models. Hi, James Betker! I have paid attention to some valuable open source you have done before. Thank you for sharing. You promised to update the SRFlow - training code. I could not find the train.py file through the link you sent. Could you please send me the code of train.py? Looking forward to your reply !

machlea commented 3 years ago

The author exports the NLL loss here: https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96 I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?

I am not 100% sure about your questions. But if you sample a multi variable gaussian the overall probability will be the product of each element, which will be a sum in log. So they collapse.

i've been tried to define it as a sum of ln(xx+1) , xx+1 makes each element >=1 so it can calculate ln and ln(x*x+1)>=0.However, it only need 30 or 40 epoches ,the loss will suddenly become very large and PSNR drop to 6

yzcv commented 3 years ago

I promised to update on training code, here you go: https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models. Hi, James Betker! I have paid attention to some valuable open source you have done before. Thank you for sharing. You promised to update the SRFlow - training code. I could not find the train.py file through the link you sent. Could you please send me the code of train.py? Looking forward to your reply !

Hi, you can find the train.py here. https://github.com/neonbjb/DL-Art-School/blob/gan_lab/codes/train.py

burb0 commented 3 years ago

I promised to update on training code, here you go: https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow

I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.

Have you tried for face SR? Could this work for face SR simply by changing datasets?

neonbjb commented 3 years ago

I have not explicitly trained models for facial SR, but I have used the pretrained models "successfully" in my repo. I'll upload a weight conversion script you can use to do the same if you care to. It'll be found in recipes/srflow/convert_official_weights.py

Before anyone spends a lot of GPU time training these models, though, I want to add a little input on my experience working with them. I would love it if the authors were to pipe in and help me out where I am wrong.

Let's use faces as an example. I pulled a random face from the FFHQ dataset and downsampled it 8x: Original: original LQ: LR

I then went to the Jupyter notebook found in this repo and did a few upsample tests with the CelebA_8x model. Here is the best result: from_jupyter Note that it is missing a lot of high frequency details and has some artifacts, notably a repeating pattern of squares (may only be visible if you download and zoom in). These are fairly typical in all SRFlow models I train, even with noise added to the HQ inputs as the authors suggest.

I then converted that same model into my repo, and ran a script I have been using to play with these models. One thing I can do with this script is generate the "mean" face for any LR input (simple really, you just use a Z=0 distribution into the flow network). Here is the output from that: mean

So what you are seeing here is what the model thinks the "most likely" HQ image is for the given LQ input. For reference, here is the image difference between the original HQ and the mean: mean_to_original_difference

Note that the mean is missing a lot of the high-frequency details. My original suspiscion for why this is happening is that the network is encoding these details into the Z vector that it trains on, e.g. the Z vector never really collapses into a true gaussian, and instead holds on to structural information about the original image. To test this, I plotted the std(dim=1) and mean(dim=1) of the Z vectors (dim 1 is channel/filter dimension): Mean: mean0_0 mean0_1 mean0_2 Std: std0_0 std0_1 std0_2

In a well trained normalizing flow, these would be indistinguishable from noise. As you can see, they are not: the Z vector contains a ton of structural information about the underlying HQ image. This tells me that the network is unable to properly capture these high frequency details and map them to a believable function.

This is, in general, my experience with SRFlow. I presented one image above, but the same behavior is exhibited in pretty much all inputs I have tested with. The best I can ever get out of the network is images with Z=0, which produces appealing, "smoothed" images that beat out PSNR losses, but it is misses all of the high-frequency details that a true SR algorithm should be creating. No amount of noise at the Z-input produces these details: they are highly spatially correlated.

I think the idea behind SRFlow has some real merit. It is likely that these networks are not being trained properly or a better architecture needs to be presented to take advantage of that potential. I also think that projecting Z into a structural space might be harmful: the model can manipulate the Z statistics to "appear" close to a Guassian but still preserve structural details within that "noise" - but that's just a hunch from a dumb computer programmer..

Oh - one last edit: if you are really interested in facial SR for some reason, I'd highly recommend checking out "GLEAN". I have an implementation in my repo and it is exceptionally good at producing high frequency details and doing extreme SR, with the hard limitation that it can only do so on tightly focused datasets like faces/cats/etc.

martin-danelljan commented 3 years ago

Hi.

First, we finally got thumbs up from the project funding partner to publish the training code, so @andreas128 will push it up asap after the holidays.

@neonbjb thank you for your analysis and sharing your experience. Here are some comments and answers to your observation:

neonbjb commented 3 years ago

Hi @martin-danelljan - excellent! I look forward to trying that code out!

I am using the cv2 bicubic downsampling kernel - the same one used by ESRGAN since it seems like you are borrowing a lot of your code from that repository. For this demonstration, I used the pretrained SRFlow_CelebA_8X.pth weights provided by setup.sh. The first result in the above post uses only code from this repo. The other results use code from my repo, which is nearly identical and should not have functional changes. I admit that it is possible that there are some changes, but the similarity in the HQ results seem to indicate not. I want to note that I had better results when using images from the CelebA dataset, but I specifically wanted to demonstrate how the model generalizes here.

Agree with all your other comments. My comment here was not made with the intent of conveying that this is a dead research direction. On the contrary - I think it is extremely exciting and I have nothing but admiration for what you have done. After spending the last half of a year playing around with these types of models, I am becoming convinced that the way forward for realistic SR is going to involve mapping HQ images to some latent space, and then performing transfers of latent data encoding high-frequency details from those images to corresponding LQ images. I think the approach SRFlow offers some tantalizing ways to make this happen.

My comment was instead to urge folks to be realistic about what they would be getting out of several hundred hours of GPU training using my repo, and I wanted to convey the point that I believe the poor results are not because I implemented something wrong. While I specifically dug into your pretrained model here, these results are the same things I am seeing for models trained on the datasets that I am using.

martin-danelljan commented 3 years ago

Hi again. I fully understand, so dont worry :)

Ok so one other important thing regarding your experiment. When it comes to face models, they are trained for a certain resolution and it is important that the same input/output resolution is used during inference. Its actually the same for StyleGAN and most other works. Basically, the network learns to use the absolute image location in order to decide where to generate eyes, nose, hair etc. So if you further downscale your image to the resolution our pretrained network was trained for, O'm sure that you will get a much better result. And in order to make it super-resolve larger images from eg. FFHQ, it needs to be retrained for that. We haven't tried this yet actually.

Thanks for again for your interest in our work. And happy new year :D

/Martin