HazyResearch / H3

Language Modeling with the H3 State Space Model
Apache License 2.0
515 stars 54 forks source link

2.7B Evaluations #1

Open sdtblck opened 1 year ago

sdtblck commented 1 year ago

Hi, great work!

Very excited to try out the models.

Curious if you have more detailed evaluation for the 2.7B model, as I can't find this in the H3 paper

DanFu09 commented 1 year ago

Thanks for your interest! We plan to update the arxiv with the full evaluations soon.

For now, we have the PPL of the 2.7B model against GPT-Neo-2.7B on the Pile:

Model Pile PPL
GPT-Neo-2.7B 5.7
H3 + 3 attn (2.7B) 5.4

We'll be updating with evaluations of everything else soon (after this week's ICML deadline).

DanFu09 commented 1 year ago

This is updated in the arXiv now: https://arxiv.org/abs/2212.14052