elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
785 stars 323 forks source link

Avoid NPE when calling nextRandom on an estimated distribution #63

Closed jcaesar closed 5 years ago

codecov-io commented 5 years ago

Codecov Report

Merging #63 into future will increase coverage by <.01%. The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff              @@
##             future      #63      +/-   ##
============================================
+ Coverage     49.33%   49.34%   +<.01%     
  Complexity    11419    11419              
============================================
  Files          1653     1653              
  Lines         83221    83224       +3     
  Branches      15544    15544              
============================================
+ Hits          41057    41064       +7     
+ Misses        38995    38993       -2     
+ Partials       3169     3167       -2
Impacted Files Coverage Δ Complexity Δ
.../statistics/distribution/AbstractDistribution.java 71.42% <0%> (-11.91%) 4 <1> (ø)
...a/elki/index/idistance/InMemoryIDistanceIndex.java 82.2% <0%> (+0.61%) 19% <0%> (ø) :arrow_down:
...c/main/java/elki/clustering/kmedoids/FastPAM1.java 96.09% <0%> (+2.34%) 4% <0%> (ø) :arrow_down:
...rc/main/java/elki/clustering/kmedoids/FastPAM.java 92.07% <0%> (+2.97%) 4% <0%> (ø) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 314f2e0...5753b25. Read the comment docs.

kno10 commented 5 years ago

new Random() hidden in a class is almost always a bad idea in the context of data analysis, as you may want to have full control over the random numbers. And since the setter introduced only checks for null if -ea assertions are used, it does not achieve the desired goal of preventing this kind of NPE.

The NPE is fine, in my opinion.

It arises from programming errors, namely uninitialized use of random generation. Since the rng is null, the NPE is the appropriate kind of exception to throw. These errors should cause an exception, and should be fixed where they arise, not worked around by falling back to creating a new random generator hidden in an abstract parent class. This is more likely to cause unexpected results when people try to seed random generators and miss this hidden one.

It would be okay to add appropriate asserts with explanatory messages that help fixing this problem when it arises. So nextRandom() should get an assert random != null : "If you want to use nextRandom(), pass a random generator to the constructor." or similar.

jcaesar commented 5 years ago

pass a random generator to the constructor.

That isn't quite possible if the Distribution is constructed by an estimator, so I'd prefer to keep the setter. Other than that, I'll modify that PR as you suggested.

[Edit:] I thought I'd written an description for this PR explaining that, but it's not there. Sorry for that.

kno10 commented 5 years ago

Oh, I see. I'm usually not a big fan of setters but would prefer variables to be final, but here it may be okay.

kno10 commented 5 years ago

I am wondering whether it wouldn't be better to modify the API to not have a Random inside the distribution at all, but rather pass it as parameter via nextRandom(Random rng). This would allow dropping the abstract class completely, and only have the interface, hence simplify the type hierarchy.

jcfj commented 5 years ago

This does solve my problem, thank you. (And it's not exactly an uncommon pattern, c.f. e.g. the [C++ STL](http://www.cplusplus.com/reference/random/geometric_distribution/operator()/).)