Open taewhankim opened 12 months ago
The size of the hidden_states matrix is batch size X sequence length X hidden size
, so 140 is the sequence length.
I'm not sure what you mean by how you get this value. Do you want to have it as an output of the model or are you asking how the value is computed?
Thanks for reply! I asked a question because I didn't know about the gpt2 structure, but I have now solved it. It's my bad. Sorry
But I have another question. When running vanilla code, could you tell why the loss value changes every time, training even though the seed is fixed?
As far as I know, huggningface's trainer has a fixed seed of 42(I even fixed the seeds with code separately). But in this project, the loss value changes every time I run it, so the metric result changes every time. Could you tell me why?
Same code, difference loss & lr results As training progresses, the difference in loss & lr values increases.
checkpoint-8856/trainer_state.json case 1:
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 1.0,
"global_step": 8856,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 1.0,
"learning_rate": 9.00022583559169e-05,
"loss": 2.397,
"step": 8856
}
],
"max_steps": 88560,
"num_train_epochs": 10,
"total_flos": 0.0,
"trial_name": null,
"trial_params": null
case 2
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 1.0,
"global_step": 8856,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 1.0,
"learning_rate": 9.000338753387534e-05,
"loss": 2.4005,
"step": 8856
}
],
"max_steps": 88560,
"num_train_epochs": 10,
"total_flos": 0.0,
"trial_name": null,
"trial_params": null
case 3
{
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 1.0,
"global_step": 8856,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 1.0,
"learning_rate": 9.000338753387534e-05,
"loss": 2.4008,
"step": 8856
}
],
"max_steps": 88560,
"num_train_epochs": 10,
"total_flos": 0.0,
"trial_name": null,
"trial_params": null
}
Thanks for great paper!
I am curious how to get hidden_state value?
and why it has more than double dim size up? I thought it would be (b,50, 768), not a (b,140,768)
Could you explain why is it?
Thanks!!!
gpt2.py
87lineshttps://github.com/RitaRamo/smallcap/blob/513f4f795950328129014eb37f011d686ab6ed24/src/gpt2.py#L87C13-L87C13