aangelopoulos / ppi_py

A package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.
MIT License
205 stars 15 forks source link

Remove sorted_highlow from form_discrete_distribution call in ppi_distribution_label_shift_ci. #15

Closed justinkay closed 1 week ago

justinkay commented 3 months ago

Addresses #14.

aangelopoulos commented 3 months ago

Just checking - have you tested this for the plankton example, and also on an example where the classes are not pre-sorted?

justinkay commented 3 months ago

I've checked this on the plankton example as well as a 'reversed' version of the planton example where I swap the frequencies of the 0 and 1 classes:

Y = ~Y
Yhat = ~Yhat
Y_unlabeled = ~Y_unlabeled
Yhat_unlabeled = ~Yhat_unlabeled

The resulting plots are:

Original plankton problem, no code change (sorted_highlow=True) ppi_plankton_sorted=True

Original plankton problem, with proposed code change (sorted_highlow=False) ppi_plankton_sorted=False

So, the original results are unaffected by the change in this PR.

Reversed distribution, no code change (sorted_highlow=True) ppi_plankton-reverse_sorted=True

This estimate fails given the unchanged code.

Reversed distribution, with proposed code change (sorted_highlow=False) ppi_plankton-reverse_sorted=False

The change in this PR results in a reasonable estimate.

However I have not tested this for other cases, for example where the number of classes is > 2. I want to make sure I understand the original intention of sorting the discrete distribution by class frequency to make sure this doesn't break something else in the future. Do you remember why this was originally done? Are there some other tests you can think of that I should try? Happy to add to the pytests as appropriate.

aangelopoulos commented 3 months ago

I think it's probably a relic from the way I used to run this experiment. It was in a notebook before it was in ppi_py, and in that notebook, there was some code for truncating the matrix to the first [K:,K:] submatrix or something (these are the K most common classes).

I don't see any way this could go wrong. If you could put a little synthetic data example as a pytest with more than 2 classes, go for it. Then once we confirm that it works, I'll test it too, and we'll merge.

You're a hero @justinkay thank you <3

aangelopoulos commented 1 month ago

Hey @justinkay --- sorry I lost track of this --- do you want to write these tests, or should I write them and merge?

justinkay commented 1 month ago

Hey @aangelopoulos sorry for the delay -- this keeps slipping down my to-do list. Happy to write a couple tests for good measure but probably won't get to it til after ECCV.

aangelopoulos commented 1 week ago

Merged! :)

aangelopoulos commented 1 week ago

Thanks for all your help @justinkay !