Include random seed in all statistical functions

bis-med-it / gingado

A machine learning library for economics and finance

https://bis-med-it.github.io/gingado/

Apache License 2.0

12 stars 4 forks source link

Include random seed in all statistical functions #7

Closed dkgaraujo closed 1 week ago

dkgaraujo commented 4 months ago

As raised by @stephprobst, all the output that is stochastic should ideally be deterministic to avoid cluttering git diffs. This can be achieved by explicitly introducing random seed numbers in functions that have a stochastic outcome.

dkgaraujo commented 1 week ago

I studied more the topic. The most insightful source was scikit-learn's treatment of the topic. In essence, setting up a random seed has non-trivial implications when cross-validation is used or in estimators such as random forests that call a random number generator with every "sub-estimator".

So in order to keep things simple, given that exact reproducibility in the documentation is not a dealbreaker, I am closing this issue.