Closed jsilbergDS closed 3 years ago
You should not apply dropout to the first or final Encoder layer. Also 0.2 is quite high - it's only required for training for an elongated period. For short finetuning runs, a small value of 0.1 is sufficient. Also weight decay should be 0.001 or lower when using dropout to prevent over regularization.
Thanks! In that case, probably makes sense for me to just leave dropout at 0 and just use an increased weight decay from the .0001 default? Thanks!
You could refer to the QuartNet paper, generally we recommend 0.001 to 0.0001 range of weights decay, tending to the higher side.
Thank you!
Hello! Thank you again for the amazing work, I really appreciate it. I am fine-tuning a pre-trained QuartzNet model and wanted to ask what you'd recommend for regularization. I have updated the dropout from the pre-trained QuartzNet default 0.0 to 0.2 using:
cfg = copy.deepcopy(quartznet.cfg) print(len(cfg['encoder']['jasper'])) for i in range(0,18): cfg['encoder']['jasper'][i]['dropout'] = 0.2 print(OmegaConf.to_yaml(cfg)) quartznet2 = quartznet.from_config_dict(cfg)
But this seems to just cause loss to explode?
Thanks!