cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.
GNU General Public License v2.0
393 stars 111 forks source link

Specifying 'random seed for single search' when there is no bootstrapping. #1717

Closed yasu-sh closed 5 months ago

yasu-sh commented 5 months ago

@jdramsey I appreciate that Kevin added the feature to set seed for bootstrapping last year. It works fine. https://github.com/bd2kccd/causal-cmd/issues/80 (solved)

I am wondering if 'random seed for single search' feature could be added. The reproducible result would make more reliable on analysis and better understanding. I guess simple algorithms can make it, but tetrad might be considerably complicated to get same result with same dataset. (Info: mathematical optimization solver(like Gubobi) have Performance Variability since hardware related reasons. https://support.gurobi.com/hc/en-us/articles/360045849232)

I would appreciate if @jdramsey think it is possible, or difficult.

explanation image: seed setting be added around the position in red. image

jdramsey commented 5 months ago

Interesting--we've tried to do this for other applications, though for BOSS, allowing complete randomness generally leads to better results, even though it is less reproducible. Let me think about it, though. You know, Thomas Richardson made the same comment to me some years ago about another algorithm, and ever since then, I've been calling it the "Richardson Principle," which he says he approves of. :-) I'll think about it. The code is Bryan's; maybe I'll ask him what he thinks as well.

jdramsey commented 5 months ago

You know, you're right, though--even if the code algorithm is random, we should be able to set a seed.... let me think...

yasu-sh commented 5 months ago

@jdramsey Thank you for your view and consideration about the enhancement.

As you say, non-deterministic process makes better results than deterministic process in Deep Learning(pytorch). https://pytorch.org/docs/stable/notes/randomness.html I mean deterministic process is also useful for users and it may be useful at development processes.

Regarding the algorithms, we would appreciate if you can check not only BOSS but also FGES/GFCI.

yasu-sh commented 5 months ago

@jdramsey I noticed I might misunderstand. I have checked the code which contains RandomUtils module at tetrad. Random seeds are set only in 6 search modules now. I mean 'the same dataset and settings makes always the same output' image

So BOSS require random module. The other hand: I have re-read your work: https://link.springer.com/article/10.1007/s41060-016-0032-z In FGES/GFCI algorithm, The multiple-process share the same cache of score temporarily. So FGES/GFCI does not use/need random module. Is it correct?

yasu-sh commented 5 months ago

https://github.com/cmu-phil/tetrad/pull/1719 @jdramsey Thanks a lot. I've checked the setting added at tetrad-7.6.2 release binary. It's time to close this issue. Confirmed: FGES/GFCI does not use random module.

My understanding: Disabling parallelization is typical way to get reproductive results at FGES/GFCI. (But it makes slow and it is like causal-learn's GES algotihm. )