How to train Cheng Anchor model?

InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research

https://interdigitalinc.github.io/CompressAI/

BSD 3-Clause Clear License

1.19k stars 232 forks source link

How to train Cheng Anchor model? #174

Closed brighteast closed 1 year ago

brighteast commented 1 year ago

I feel really thankful for your work! I have some questions about training methods. I trained Cheng-anchor in a way the paper said. I trained 50 epochs(about 2m steps, lambda quality = 1) and apply scheduler. After 200 epochs, I test the model. But PSNR was 27.999 which is lower than the paper's reports. So first I tried to train model for 100 epochs( in that case, lambda quality = 3), but during about 60 epochs it results in nan value. I think it was diverged. So could you tell me how to train cheng's model?

fracape commented 1 year ago

Hi, which dataset did you use? We did not face NaN values when using examples/train.py and the Vimeo-90k dataset. Note that the paper reports results for the full model, compressai's cheng2020-anchor does not include GMM / attention. Please refer to our result json files.

brighteast commented 1 year ago

Sorry for late check. I used OpenImage dataset for experiments. But in the case of minnen's model like cheng's model, I face NaN values at high quality( e.g. quality 8, highest quliaty). But at low quality like quality 1, 2, I didn't face NaN values. Then Is dataset a problem? I checked the values at low qualities and I got same values with your result json files.

brighteast commented 1 year ago

I solved the problem by strengthening more gradient clip value. Thank you :)

YodaEmbedding commented 1 year ago

Just for reference, what value of clip_max_norm worked for you?

brighteast commented 1 year ago

I used '0.1' and it worked.