Closed tbaggu closed 1 year ago
Hm, I haven't trained with that combination ever, hard to say what is going on. One sanity check question, during downstream finetuning, is the model also set to float32?
Yes,
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Jonas Geiping @.> Sent: Tuesday, July 4, 2023 12:50:28 AM To: JonasGeiping/cramming @.> Cc: Tirupathi Rao Baggu @.>; Author @.> Subject: Re: [JonasGeiping/cramming] GLUE evaluation numbers are very poor, if increase the sequence length to 512 and float 32 (Issue #28)
Caution: This email originated from outside of the organization. Please take care when clicking links or opening attachments. When in doubt, contact your IT Department
Hm, I haven't trained with that combination ever, hard to say what is going on. One sanity check question, during downstream finetuning, is the model also set to float32?
— Reply to this email directly, view it on GitHubhttps://github.com/JonasGeiping/cramming/issues/28#issuecomment-1619039593, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A4BNIQIISJMNBTZLHSQU4NTXOMLPZANCNFSM6AAAAAAZ35IXDE. You are receiving this because you authored the thread.Message ID: @.***>
Hm just a note: There is also a separate max_seq_length
setting in eval that is set only to 128 by default. Setting this to a lower number shouldn't make things much worse though
I have pre trained and fine turned for different settings and observed steps vs loss for these settings, in case of 512 sequence length with fp32 ,I can see spikes in loss, I think it's due to learning rate, I have just reduced the lr, and re training, let me see how the results would be
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Jonas Geiping @.> Sent: Tuesday, July 4, 2023 7:51:31 PM To: JonasGeiping/cramming @.> Cc: Tirupathi Rao Baggu @.>; Author @.> Subject: Re: [JonasGeiping/cramming] GLUE evaluation numbers are very poor, if increase the sequence length to 512 and float 32 (Issue #28)
Caution: This email originated from outside of the organization. Please take care when clicking links or opening attachments. When in doubt, contact your IT Department
Hm just a note: There is also a separate max_seq_length setting in eval that is set only to 128 by default. Setting this to a lower number shouldn't make things much worse though
— Reply to this email directly, view it on GitHubhttps://github.com/JonasGeiping/cramming/issues/28#issuecomment-1620336838, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A4BNIQP54IS6CEMKCCYSU4DXOQRGXANCNFSM6AAAAAAZ35IXDE. You are receiving this because you authored the thread.Message ID: @.***>
Closing this for now, cannot reproduce. Let me know if you find the source for this potential discrepancy.
Hi
I am trying to do some bench-marking as part of my experiments i want train BERT model with 512 sequence length and dtype as float 32 , i have pre trained the model wth above configuration and run the evaluation on glue_sne but the numbers are very poor.
May i know what went wrong