Closed brighteast closed 1 year ago
Hi, which dataset did you use? We did not face NaN values when using examples/train.py and the Vimeo-90k dataset. Note that the paper reports results for the full model, compressai's cheng2020-anchor does not include GMM / attention. Please refer to our result json files.
Sorry for late check. I used OpenImage dataset for experiments. But in the case of minnen's model like cheng's model, I face NaN values at high quality( e.g. quality 8, highest quliaty). But at low quality like quality 1, 2, I didn't face NaN values. Then Is dataset a problem? I checked the values at low qualities and I got same values with your result json files.
I solved the problem by strengthening more gradient clip value. Thank you :)
Just for reference, what value of clip_max_norm
worked for you?
I used '0.1' and it worked.
I feel really thankful for your work! I have some questions about training methods. I trained Cheng-anchor in a way the paper said. I trained 50 epochs(about 2m steps, lambda quality = 1) and apply scheduler. After 200 epochs, I test the model. But PSNR was 27.999 which is lower than the paper's reports. So first I tried to train model for 100 epochs( in that case, lambda quality = 3), but during about 60 epochs it results in nan value. I think it was diverged. So could you tell me how to train cheng's model?