And I would like to ask how many samples are you using for inference across different datasets?
The default number seems to be 200 in the args_utis.py, is that used for all of the different datasets? As it seems to have a big impact on evaluation performances.
Hi, thank you for your comment. Yes we consistently use 200 evaluation samples, but providing more samples should actually reduce variance. Best, Alexandre
Interesting paper!
And I would like to ask how many samples are you using for inference across different datasets? The default number seems to be 200 in the args_utis.py, is that used for all of the different datasets? As it seems to have a big impact on evaluation performances.
Many thanks!