Open mmmaia opened 1 year ago
Good idea! Do you want to add it to https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/datasets.py (for English)? I'm about to run a new round of benchmarks so we could include that as one dataset.
I'm pretty new to this, so would probably take some time before getting it to work 😬
I may give it a try next week, if nobody does it.
Ok no rush, I can also take a look at it. But you're very welcome to look at it too, if I don't have time to!
I believe the recently released Cohere's Wikipedia Embedding Archives could be a good addition to the benchmarks dataset.
It's note worth the multi language nature of the dataset.