Closed tollefj closed 7 months ago
@tollefj thanks for raising this concern and thanks for the compliment! I will look into fixing this issue monday
I could gladly help out if there's an agreement on how to handle the data :) I envision something along the lines of how ScandEval did it. I just believe it should be clearer how to go from the source data to the subsets (if subsets are even desired?)
I find the scandeval implementation's abstraction level to be a bit too high, having to track down the evals through what feels like hundreds of files.
That would be great @tollefj. I have already re-uploaded the dataset for MTEB so it should simply be reuploading replacing the links - you are more than welcome to do a PR on it.
Re. complexity of ScandEval. ScandEval proposes a different trade-off than SEB (focusing on especially on robustness). For that it also pays a cost in complexity and how fast the benchmark is to run.
@tollefj added the fixes in #174 assuming everything pass, they will be merged in automatically.
It's been a while since I last ran experiments with seb (but I much prefer this interface than scandeval itself, with more control over model configurations). Now, however, some datasets seem to be missing from scandeval, like the
ScandEval/norquad-mini
and scala-da. Perhaps these are removed due to licenses, I don't know.To avoid future problems with datasets, perhaps it would be an idea to create them from the originals instead of hosting subsets?