Open ksrinivs64 opened 1 year ago
Code here: https://github.com/HazyResearch/safari
We don’t have a config for fine tuning, but will look to add it soon!
On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 @.***> wrote:
Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!
— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/H3/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
thanks!
Code here: https://github.com/HazyResearch/safari We don’t have a config for fine tuning, but will look to add it soon! … On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 @.> wrote: Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks! — Reply to this email directly, view it on GitHub <#20>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA . You are receiving this because you are subscribed to this thread.Message ID: @.>
Hello, I am also interested in the fine-tuning code. If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed? I really want to push this to its limits.
Also a few points.
Thanks for all the hard work. You're a rock star.
Thank you for the kind words!
If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed?
In safari, we'll need to put in some hooks for loading a pre-trained model: https://github.com/HazyResearch/safari/blob/main/train.py#L170. The H3 model definition in that repo has slightly different parameter names than the one in this repo, so we may need to have some custom code to rename the model parameters upon loading the state dict. Then it should work.
We'll try to get this implemented soon!
Just came here to say I'm interested in the fine-tuning code as well! Great work! Listening to the podcast from Deep Papers and found out about you guys!
Been looking for a solution to the long context window for a little while now. So I'm excited to start training it on custom data!
Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!