Size of Prediction Sets using APS Different Than Reported in RAPS Paper

aangelopoulos / conformal-prediction

Lightweight, useful implementation of conformal prediction on real data.

http://people.eecs.berkeley.edu/~angelopoulos/blog/posts/gentle-intro/

MIT License

745 stars 85 forks source link

Size of Prediction Sets using APS Different Than Reported in RAPS Paper #8

Closed kevinkasa closed 1 year ago

kevinkasa commented 1 year ago

Hello,

Thank you so much for providing the conformal prediction tutorial & corresponding notebooks, they are super helpful!

I had a question regarding the size of the prediction sets returned using the APS methods. In the implementation provided in the notebooks, the prediction sets are far larger than reported on your paper than introduced RAPS. The notebook implementation returns sets that are on average >200 labels, whereas the paper reports an average set size of 10.4, on ResNet152.

I have not done extensive evaluation on RAPS, but it seems the notebook implementation also returns slightly larger sets (set size of ~3).

I was wondering if you have any ideas as to what might be causing this discrepancy, and what the best way to replicate the results in the paper might be.

Also, I wasn't sure which repo this issue should be opened in, so apologies if it doesn't fit here. Thanks in advance!

aangelopoulos commented 1 year ago

It's probably due to the lack of randomization! None of the methods herein are the randomized versions of their respective algorithms... and APS is extremely bad without randomization. If you randomize, you should recover roughly the results in the paper. Of course, that paper also has its own repo, but it's less friendly than this one.

aangelopoulos commented 1 year ago

Hey @kevinkasa, have you had a chance to follow up here? Just wondering if this answers your question.

kevinkasa commented 1 year ago

Hey @aangelopoulos thanks for the quick response! I was just slightly confused since both your paper and the APS paper seemed to suggest that randomization should affect the sets by at most one element, so it was surprising that APS lead to considerably larger sets without it. I suppose that algorithm is just super sensitive without it then?

Was planning on trying to add randomization to the notebook implementations but haven't had a chance yet. I am trying out the other RAPS repository in the meantime as well. Thanks!

aangelopoulos commented 1 year ago

Good question.

Randomization at test time only changes the set by one element. Randomization during calibration has a much larger effect.

kevinkasa commented 1 year ago

I see, thank you for the clarification!