Closed smdrnks closed 1 month ago
Hi @smdrnks I confirm that Fig 4 show the training of Foundation Long. The training curve of Foundation is the same (since it's the same model trained for less steps) but it stops at $6\times 10^{10}$ steps (i.e. when we increase the context size from $2^{15}$ to $2^{16}$ tokens). Regarding the weights of the FIM-finetuned Foundation Long model we can release them, please send us an email and we'll share them with you. There is not much difference with respect to Foundation Long since it's a "very light" finetuning (few thousands of steps with FIM-size = 1) but it'll allow you to replicate the results for sure.
Hi!
I just downloaded your model weights and I have a few questions. In the paper you train four different models:
My first question: Since there is only the Protmamba Foundation Long released, could you also consider releasing the weights for the remaining three models? At least the FiM tuned model would be great so I can reproduce the best Proteingym results from Figure 8.
Next I want to compare training curves. Does Figure 4 show the training of Foundation or Foundation Long?
Thanks again for your help!