Training Schedule - Learning Curves

Bitbol-Lab / ProtMamba-ssm

ProtMamba: a homology-aware but alignment-free protein state space model

https://www.biorxiv.org/content/10.1101/2024.05.24.595730v1

Apache License 2.0

44 stars 7 forks source link

Training Schedule - Learning Curves #12

Open lisa-schneckenreiter opened 2 weeks ago

lisa-schneckenreiter commented 2 weeks ago

Hi!

I was wondering which settings the learning curves in Supplementary B correspond to. In this notebook there is a hint (screenshot below) that suggests you trained only with context lengths of 2048, 16K, 32K and 131K for the indicated number of steps/tokens. Is this correct?

Thank you for your help!

damiano-sg commented 2 weeks ago

Ops, those lines of code shouldn't have been shared, they are just an ugly way we used to convert from steps to tokens to have a more direct comparison between some of the checkpoints. Figure 4 of Supplementary B shows the full training of the Foundation Long model, up until the $2^{17}$ context.

lisa-schneckenreiter commented 2 weeks ago

Thank you for your fast response! Unfortunately, not knowing the exact number of tokens at which context windows were extended makes it difficult to compare models at different stages of training. It would be great if you could share that information. Thanks again!

damiano-sg commented 1 week ago

What do you mean? The only model that we share is "Foundation Long" which is the one that was trained up to a $2^{17}$ tokens contexts. We don't share earlier checkpoints trained with shorter contexts. If you tell me more precisely what kind of comparison you plan to do I can help you better. You can send me an email at damiano.sgarbossa@epfl.ch so we can continue the conversation there.