Grad accumulation in decoding

RaphaelOlivier / robust_speech

Apache License 2.0

38 stars 14 forks source link

Grad accumulation in decoding #6

Closed fsepteixeira closed 1 year ago

fsepteixeira commented 1 year ago

The current code for this model accumulates gradients when performing decoding, which invariantly causes the process to break due to OOM. If the gradient is never used to create adversarial noise in this step, it would make sense to put this block inside with torch.no_grad().