bazingagin / npc_gzip

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
MIT License
1.77k stars 156 forks source link

Fix test bug where size-1 sample may be used (with top_k=2) #46

Closed EliahKagan closed 1 year ago

EliahKagan commented 1 year ago

Closes #45

This changes the range of possible sample sizes in TestKnnClassifier.test_predict to have a lower bound of 2 instead of 1, so that the test, which always uses a top_k of 2, no longer occasionally fails.

The change is actually very simple, but I did it in a few commits to verify and show that a lower bound of 1 was the problem. The sample size is being selected randomly, so it would otherwise not be immediately clear that this change really fixes the bug.

bazingagin commented 1 year ago

I think this situation will happen whenever the number of samples is smaller than k value. We can add constraint on that in the future but for now, it's all good. Thanks!