AnswerDotAI / bert24

Apache License 2.0
25 stars 3 forks source link

Model : doing a Albert24 ? #52

Closed sileod closed 1 month ago

sileod commented 1 month ago

Hi, I think that the albert architecture is quite underrated It is basically a bert with depth-wise weight sharing

By training a newer albert, we could use layerdrop https://www.arxiv.org/abs/1909.11556 during training This could produce a model with elastic capacity (you can chose during inference your time budget)

I think it would be a great contribution to newer text encoders

bclavie commented 1 month ago

Hey, thanks for the suggestion. We're very mindful to avoid scope creep as there are a million items on the wishlist, and sticking to the best-performer that are also popular (e.g. BERT/RoBERTa/DeBERTaV3) is very important to ensuring we actually get somewhere. LayerDrop and Albert both bring too little to the table at this stage compared to other things, but hopefully this turns into a long-term community effort where other people can try it!