elastic / rally-tracks

Track specifications for the Elasticsearch benchmarking tool Rally
19 stars 181 forks source link

Normalize msmarco-v2-vector queries and switch to dot-product #578

Closed 1stvamp closed 7 months ago

1stvamp commented 7 months ago

This:

After an investigation I found the vectors from the msmarco-v2 dataset cohere published are normalised, but the queries from their API, despite their API docs saying to the contrary, are not, and another ingestion issue made it look like it was the dataset and not the queries in CI.

Dataset file sizes changed in GCP, and so in track.json, due to playing around with normalisation, but is the same.