Open MohamedLotfyElrefai opened 4 years ago
my model -hs 60 -l 3 -a 3 -s 26 -b 5 -e 10 -w 4 --with_cuda True --log_freq 20 --on_memory False --lr 1e-3 --adam_weight_decay 0.0 --adam_beta1 0.9 --adam_beta2 0.999
Total Parameters: 229,503,472 which **bert base** using only
-hs 768 -a 12 -l 12
110M parameters
my model -hs 60 -l 3 -a 3 -s 26 -b 5 -e 10 -w 4 --with_cuda True --log_freq 20 --on_memory False --lr 1e-3 --adam_weight_decay 0.0 --adam_beta1 0.9 --adam_beta2 0.999
of parameter
Total Parameters: 229,503,472 which **bert base** using only
of parameter
110M parameters