MastodonC / kixi.stats

A library of statistical distribution sampling and transducing functions
https://cljdoc.xyz/d/kixi/stats
360 stars 18 forks source link

Added tests to check that cdf and quantile are inverses. #33

Closed bombaywalla closed 4 years ago

bombaywalla commented 4 years ago

The Student T distribution seems to fail with a divide by zero in quantile, so commented out that test for now. Not sure where the problem is.

Some distributions such as uniform and exponential do not seem to implement p/PQuantile. Would you like me to add the implementations?

henrygarner commented 4 years ago

For sure, protocol implementations for other distributions would be very welcome!

bombaywalla commented 4 years ago

For sure, protocol implementations for other distributions would be very welcome!

Okay. Will do. Do you want the protocol implementations in this PR? A separate PR? Or doesn't matter?

henrygarner commented 4 years ago

Or doesn't matter?

It doesn't matter, since the features are logically related, but smaller PRs are easier for me to review :)

bombaywalla commented 4 years ago

Or doesn't matter?

It doesn't matter, since the features are logically related, but smaller PRs are easier for me to review :)

Okay. I will add the protocol implementations in another PR.

bombaywalla commented 4 years ago

The test for the chi-squared distribution failed. On investigation, I see that

user> (kd/cdf (kd/chi-squared {:k 8}) (kd/quantile (kd/chi-squared {:k 8}) 0.018)) 0.016939959182315483

Would you take a look?

bombaywalla commented 4 years ago

I modified the cdf for Student T to match the quantile at +/- infinity. Now the test passes.

I temporarily relaxed the equality bounds for the chi-squared test so that it would pass with the current implementations of cdf/quantile. Not sure that was the right thing to do. Let me know.

bombaywalla commented 4 years ago

If I understand things correctly, the current cdf/quantile test spec just uses one random probability value to test that the cdf and quantile are inverses. I was thinking that it might be better to use a few (10? 20?) different probability values, and make sure that 0.0 and 1.0 are included. What do you think?

I can generate such a vector of test probabilities. What would be a good way of adding that to the spec so that, if there is a failure, the test probability value that caused the failure would be displayed?

Thanks

henrygarner commented 4 years ago

I've pushed a gamma-pinv bugfix to master addressing the the root cause of the intermittent chi-squared test failure. Rebasing onto master should result in successful tests for you too. Thanks for picking this up!

The generative tests are set to run 100 assertions for each test, so lots of different values will be tried. I've added a commit to this branch which makes the probability generator more efficient, but it still doesn't reliably output 0 or 1: the only way I know of to do this is to write a static test for those cases (though there may be some generator magic I'm not aware of).

bombaywalla commented 4 years ago

Thanks for the gamma-pinv fix. I restored the full accuracy test for chi-squared and it passes. I rebased on master and force-pushed this branch. For now, I'l punt on making sure p=0.0 and 1.0 are used in the tests.

If you're okay with it, this PR is ready to merge into master.