embeddings-benchmark / leaderboard

Code for the MTEB leaderboard
https://hf.co/spaces/mteb/leaderboard
9 stars 6 forks source link

Remove Git LFS #1

Closed orionw closed 2 months ago

orionw commented 2 months ago

To deal with Github LFS bandwidth, remove all pickle files and replace with regular data files. The results files are a bit complex but now they are also saved as readable formats.

@KennethEnevoldsen @Muennighoff thoughts?

orionw commented 2 months ago

@KennethEnevoldsen These are good questions! They both have the same answer in that the previous app.py file refreshed the leaderboard and created two intermediate objects (all_data_tasks and boards_data). I split it up so we can cache those intermediate objects, so I created a function to save them to file and load them without using pickle since they were a very nested dictionary of dataframes.

I do think we should simply it at some point and only save out the files that are actually needed... and in a more understandable format. It would probably take some unravelling of the app.py script to change it to use a more standard object format.

KennethEnevoldsen commented 2 months ago

Thanks @orionw, exactly what I wanted to know!