In your paper, as for classification task, you mention the use of the maximum classification predictive label probability as a measure of uncertainty. However, this is not reflected in your GitHub project, where I observed that the estimation of uncertainty is derived entirely from the loss comp uted by the run_model method of the MetaICL object. The loss calculation there seems to be based on the loss for generative tasks, which does not align with the description provided in your paper.
Could you please elucidate how you implement uncertainty estimation for classification tasks? It would be greatly appreciated if you could clarify the discrepancy between the theoretical approach described in your publication and the practical implementation in your code.
This is ridiculous, did you reproduce the results? If calculate losses, how is it possible to say its annotation budget effiicient? You cant compute losses without annotating all data right?