Closed Luciennnnnnn closed 2 years ago
Using torch.checkpoint will produce exactly the same results. use_checkpoint_attn
(recommended if oom) can save more memory than use_checkpoint_ffn
.
Do I have to modify channels or depth? If I keep parameters unchanged and use_checkpoint_attn
, how long it will take for training on 2 3090s?
Reducing channels or depth may lead to worse performance. I have no idea how long it takes on 2 3090s.
If I only have 2/4 3090s and want to train a model for ×4 VSR, how can I set training parameters effectively? That is no OOM, no large performance drop, mild training time.
For example, there are two parameters of using checkpoint to save Cuda memory,
use_checkpoint_attn
anduse_checkpoint_ffn
, which one is the most influence one for training time/memory consumption?Looking forward to your reply, thank you.