Digital-Defiance / nlp-metaformer

An ablation study on the transformer network for Natural Language Processing
3 stars 0 forks source link

experiment: long running small model (v2) #54

Closed RuiFilipeCampos closed 7 months ago

RuiFilipeCampos commented 7 months ago

Coming from:

The objective of this experiment is to determine the convergence value of a long running experiment on a small model. In case of #52, this value was 1.1.

Based on #48, I've determined that an LR scaling factor of 1.0 might be more beneficial to ensure stability. I've also noticed that smaller models will have a loss function that behaves in a more deterministic manner, so I'm not creating several runs for this one as I have no reason to believe that the loss graphs would diverge (no reason yet).

Configuration Value
attention metric
batch_size 10
beta_1 0.9
beta_2 0.98
bias False
coordinates 200
epsilon 1e-09
l1_regularization 0.0
l2_regularization 0.0
lr_schedule_scaling 1.0
number_of_blocks 1
number_of_epochs 1
number_of_heads 10
number_of_parameters 10,582,700
number_of_slices 50
tokens 50,263
warmup_steps 4000
words 624
RuiFilipeCampos commented 7 months ago

Something is killing the runs

Error: Process completed with exit code 137.

no idea what it is at the moment