Closed justinkay closed 1 week ago
Just checking - have you tested this for the plankton example, and also on an example where the classes are not pre-sorted?
I've checked this on the plankton example as well as a 'reversed' version of the planton example where I swap the frequencies of the 0 and 1 classes:
Y = ~Y
Yhat = ~Yhat
Y_unlabeled = ~Y_unlabeled
Yhat_unlabeled = ~Yhat_unlabeled
The resulting plots are:
Original plankton problem, no code change (sorted_highlow=True
)
Original plankton problem, with proposed code change (sorted_highlow=False
)
So, the original results are unaffected by the change in this PR.
Reversed distribution, no code change (sorted_highlow=True
)
This estimate fails given the unchanged code.
Reversed distribution, with proposed code change (sorted_highlow=False
)
The change in this PR results in a reasonable estimate.
However I have not tested this for other cases, for example where the number of classes is > 2. I want to make sure I understand the original intention of sorting the discrete distribution by class frequency to make sure this doesn't break something else in the future. Do you remember why this was originally done? Are there some other tests you can think of that I should try? Happy to add to the pytest
s as appropriate.
I think it's probably a relic from the way I used to run this experiment. It was in a notebook before it was in ppi_py
, and in that notebook, there was some code for truncating the matrix to the first [K:,K:]
submatrix or something (these are the K
most common classes).
I don't see any way this could go wrong. If you could put a little synthetic data example as a pytest
with more than 2 classes, go for it. Then once we confirm that it works, I'll test it too, and we'll merge.
You're a hero @justinkay thank you <3
Hey @justinkay --- sorry I lost track of this --- do you want to write these tests, or should I write them and merge?
Hey @aangelopoulos sorry for the delay -- this keeps slipping down my to-do list. Happy to write a couple tests for good measure but probably won't get to it til after ECCV.
Merged! :)
Thanks for all your help @justinkay !
Addresses #14.