Open orionw opened 1 week ago
Thanks for this, Orion - I think most of these are quite small (SNL, Twitterhjerne, TV2Nord, SweFaq, DanFEVER, NorQuad and Swedn)
but the others might be large enough. I will set this to help wanted
to allow someone to grab it
Clues for whoever takes this: this is reproducible in python using
from mteb import get_tasks
task = get_tasks(tasks=["NorQuadRetrieval"])[0]
task.calculate_metadata_metrics()
which then gives an error from HFDataLoader._load_corpus(self)
:
122 def _load_corpus(self):
123 if self.hf_repo:
--> 124 corpus_ds = load_dataset(
125 self.hf_repo,
126 "corpus",
127 keep_in_memory=self.keep_in_memory,
128 streaming=self.streaming,
129 )
Seems like the method looks for a corpus
subset that is not present. Comparing to a dataset like NFCorpus, mteb/norquad_retrieval does not have that subset.
In fact, the following command is also broken, with the same error above:
mteb run -t NorQuadRetrieval -m intfloat/multilingual-e5-base --model_revision d13f1b27baf31030b7fd040960d60d909913633f
Seems like some of these are due to missing load_data() functions. I have added these in #953 to ensure that as much of the possible runs for @Muennighoff.
This resolved the issues of the task being unable to load for some of the datasets (SNL, Twitterhjerne, TV2Nord, SweFaq, DanFEVER, NorQuad and Swedn) as well as solve the statistics calculation issue (though not for Norquad). I have to run to another thing so can't look more into it, but this at least solve about half
I'll take a look at this.
When running
task.calculate_metadata_metrics()
for retrieval tasks, there are a handful that fail to run (most of the ~130 work though, which is great!)They are:
Almost all the errors are due to things like:
At some point we should resolve this, either via changes to the
calculate_metadata_metrics
function that use some parameter passed in, or by changing the tasks__init__
function to define the needed parts.cc'ing @KennethEnevoldsen as an FYI. The RetrievalStats tab of this Google Sheet has the stats for the current tasks that did succeed. Around 30 will need to be reduced in size, plus some of the above that I do not have stats for potentially.