RitaRamo / smallcap

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
94 stars 20 forks source link

How to get hidden_state value? #13

Open taewhankim opened 12 months ago

taewhankim commented 12 months ago

Thanks for great paper!

I am curious how to get hidden_state value?

and why it has more than double dim size up? I thought it would be (b,50, 768), not a (b,140,768)

Could you explain why is it?

Thanks!!! gpt2.py 87lines

hidden_states: Optional[Tuple[torch.FloatTensor]] image

https://github.com/RitaRamo/smallcap/blob/513f4f795950328129014eb37f011d686ab6ed24/src/gpt2.py#L87C13-L87C13

YovaKem commented 11 months ago

The size of the hidden_states matrix is batch size X sequence length X hidden size, so 140 is the sequence length. I'm not sure what you mean by how you get this value. Do you want to have it as an output of the model or are you asking how the value is computed?

taewhankim commented 9 months ago

Thanks for reply! I asked a question because I didn't know about the gpt2 structure, but I have now solved it. It's my bad. Sorry

But I have another question. When running vanilla code, could you tell why the loss value changes every time, training even though the seed is fixed?

As far as I know, huggningface's trainer has a fixed seed of 42(I even fixed the seeds with code separately). But in this project, the loss value changes every time I run it, so the metric result changes every time. Could you tell me why?

Same code, difference loss & lr results As training progresses, the difference in loss & lr values ​​increases.

checkpoint-8856/trainer_state.json case 1:

  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.00022583559169e-05,
      "loss": 2.397,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null

case 2

  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.000338753387534e-05,
      "loss": 2.4005,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null

case 3

{
  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.000338753387534e-05,
      "loss": 2.4008,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null
}