HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Long-Range-Arena Evaluation #49

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

Currently, we only know that our model is better than the baseline because of its lower loss at less training time. However, we could run some benchmarks such as LRA to see how well our long-context model performs in a real-world scenario. While LRA doesn't leverage our capabilities ideally (unlike, for example, #5 and #9), it'd still allow us to have preliminary evaluation results on a well-known benchmark dataset.\ This issue tacks the progress of integrating our model into LRA, even though it should happen in a separate codebase.