HazyResearch / H3

Language Modeling with the H3 State Space Model
Apache License 2.0
511 stars 53 forks source link

Release of pretraining and fine tuning code #20

Open ksrinivs64 opened 1 year ago

ksrinivs64 commented 1 year ago

Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!

DanFu09 commented 1 year ago

Code here: https://github.com/HazyResearch/safari

We don’t have a config for fine tuning, but will look to add it soon!

On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 @.***> wrote:

Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/H3/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ksrinivs64 commented 1 year ago

thanks!

KurtFeynmanGodel commented 1 year ago

Code here: https://github.com/HazyResearch/safari We don’t have a config for fine tuning, but will look to add it soon! On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 @.> wrote: Hi, thanks for a very nice piece of work. Do you also plan to release pretraining and fine tuning code for this model? Also, is there a way to apply the model on long sequences, such as those from the LRA benchmark? Thanks! — Reply to this email directly, view it on GitHub <#20>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIWGFGZAM2YVAJGZHLDW3DDSRANCNFSM6AAAAAAVUC3GUA . You are receiving this because you are subscribed to this thread.Message ID: @.>

Hello, I am also interested in the fine-tuning code. If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed? I really want to push this to its limits.

Also a few points.

  1. Great work. I honestly believe this is the path to true coherent multi-modality
  2. I tried out the models and I am impressed with the context length.
  3. I did notice a higher propensity to hallucinate gibberish at extreme lengths but I assume that is due to model size.

Thanks for all the hard work. You're a rock star.

DanFu09 commented 1 year ago

Thank you for the kind words!

If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be constructed?

In safari, we'll need to put in some hooks for loading a pre-trained model: https://github.com/HazyResearch/safari/blob/main/train.py#L170. The H3 model definition in that repo has slightly different parameter names than the one in this repo, so we may need to have some custom code to rename the model parameters upon loading the state dict. Then it should work.

We'll try to get this implemented soon!

jordancole21 commented 1 year ago

Just came here to say I'm interested in the fine-tuning code as well! Great work! Listening to the podcast from Deep Papers and found out about you guys!

Been looking for a solution to the long context window for a little while now. So I'm excited to start training it on custom data!