Closed RissyRan closed 3 weeks ago
load_balance_loss_weight
local test on a small model size:
capacity_factor=-1
capacity_factor=1
Description
load_balance_loss_weight
as 0.01 (from paper that this value loads balance quickly without interfering the training loss)Test
local test on a small model size:
capacity_factor=-1
): baseline - lb_loss=0capacity_factor=1
): test