Open gnatesan opened 22 hours ago
You might want to take a look at saving retrieval task predictions
@orionw might have additional pointers
+1 that flag helps. I tend to use this a lot as well @gnatesan, let me know if there’s additional things that would be helpful. For example, I don’t think we save out the qrels or query-specific scores, although we could add flags for those also.
I want to be able to perform post-evaluation query filtering after evaluating a model on a retrieval benchmark. In other words, after evaluation is ran I want to be able to select a subset of the test queries based on the query length and look at the performance metrics (i.e. queries of length 15-20). However, I want to do this after evaluation is completed so that I do not need to run evaluation every time I want to change the text length range for the subset. How would I save the results of 'results = evaluation.run(model, verbosity=2, eval_splits=["test"]' such that I can do this? And is this even possible?