Can not repeat the performance

cogugoat commented 1 year ago

Hi,

Thanks for providing this amazing projects. I tried the learning step 1, which trained 300k iterations with half of the provided AVC-Train datasets. Every setting in the _train_animesr_step1_netBasicOPonly.yml file was kept exactly the same. However, it appears to me the provided _pretrained_animesr_step1_netmodel.pth achieved much better performance than the model I trained (the 300k iteration is not done yet, I tried net_g_170000.pth). Would you mind shed some lights on the possible mistakes I might took. Thank you.

ToTheBeginning commented 1 year ago

Hi, first please check that you test it correctly, you can refer to this similar issue.

If it's not about checkpoint loading problem , please provide more details so we can help you. For example, when you say "much better performance", are you referring to perceptual visual quality or quantitative comparisons? Could you provide some examples? Also, did you use 4 GPUs to train the model, and what about 300k iteration and full datasets?

cogugoat commented 1 year ago

Hi, first please check that you test it correctly, you can refer to this similar issue.

If it's not about checkpoint loading problem , please provide more details so we can help you. For example, when you say "much better performance", are you referring to perceptual visual quality or quantitative comparisons? Could you provide some examples? Also, did you use 4 GPUs to train the model, and what about 300k iteration and full datasets?

Thanks for the response. After checked the link, I did suffer from the same strict = True issue when load the model, and I guess no meaningful parameters has been loaded. I have two concerns about the generated step1 model. With step1, I got the pth file with size of 12M which is twice of the provided _pretrained_animesr_step1_net_model.pth_. In addition, I would like to know whether the provided opt files are specific to the AnimeSR_v2. In other words, I followed the exact training steps and no modification on opt yml file, but generated training pth file can not be directly loaded within scripts/inference_animesr_frames.py file. Thanks.

ToTheBeginning commented 1 year ago

The training pth file maintains two parameter groups (which is supported by BasicSR): one for normal params and one for exponential moving average (ema) params, so it is twice the size of the real parameters. The released checkpoint only has plain parameters, you can load the two models and check the keys to see the difference.

cogugoat commented 1 year ago

Thank you, I solved the problem by load with ['params']

TencentARC / AnimeSR

Can not repeat the performance #12