experiment: evaluate training performance of small model

RuiFilipeCampos commented 5 months ago

(I'm also still testing the pipelines)

http://localhost/#/experiments/1/runs/69231cb1d27f4cd8ba732fc360239bb5

Configuration	Value
attention	metric
batch_size	1
beta_1	0.9
beta_2	0.98
bias	False
coordinates	100
epsilon	1e-09
l1_regularization	0.0
l2_regularization	0.0
lr_schedule_scaling	1.0
number_of_blocks	1
number_of_epochs	1
number_of_heads	10
number_of_parameters	5,190,850
number_of_slices	50
tokens	50,263
warmup_steps	4000
words	624

loss/train(step)

newplot(35)

RuiFilipeCampos commented 5 months ago

loss/train(step)

newplot(25)

RuiFilipeCampos commented 5 months ago

2024-02-11T17:52:09.7160680Z 2024-02-11 17:52:09,715 [INFO] ---------- Step 34406 ---------
2024-02-11T17:52:09.7161925Z 2024-02-11 17:52:09,715 [INFO] Cleaning up memory...
2024-02-11T17:52:09.8586044Z 2024-02-11 17:52:09,858 [INFO] Called garbage collector.
2024-02-11T17:52:09.8645220Z 2024-02-11 17:52:09,864 [INFO] Emptied gpu cache.
2024-02-11T17:52:09.8646439Z 2024-02-11 17:52:09,864 [INFO] Fetching slice 22 from worker...
2024-02-11T17:52:11.0965984Z Killed

Unsure why it got killed. but I'm guessing memory leak

RuiFilipeCampos commented 5 months ago

newplot(26)

Digital-Defiance / nlp-metaformer

experiment: evaluate training performance of small model #52

loss/train(step)

loss/train(step)