aertslab / arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

Different results with same random seed #35

Open divyanshusrivastava opened 1 year ago

divyanshusrivastava commented 1 year ago

I am using arboreto.core to run grnboost2. In different runs, using the same seed, I am obtaining different results. What could be the problem?

divyanshusrivastava commented 1 year ago

I think I figured out the root cause for this. Actually, the order in which genes are passed changes the results.

Case 1 Expression matrix.

0610007L01Rik  0610010K14Rik  0910001L09Rik  1100001G20Rik  1110004E09Rik  1110007A13Rik  1110013L07Rik  1190002H23Rik  1190007F08Rik  1200002N14Rik
index                                                                                                                                                      
0          -0.636101      -1.184288      -0.696000      -0.190556      -0.586632      -0.775190      -0.269762      -0.617464      -0.424003      -0.155427
1          -0.636101      -0.268231       0.412374      -0.190556       0.887191      -0.775190      -0.269762       0.447862      -0.424003       4.575706
2           0.269204       0.249596       0.054724      -0.190556       1.142911       1.001526      -0.269762      -0.617464      -0.424003      -0.155427
3           0.238592       0.210235       0.029340      -0.190556       0.377865       0.516827       1.004492       1.343563      -0.424003      -0.155427
4          -0.636101       1.200998      -0.031379      -0.190556       0.965279      -0.094555      -0.269762       0.504306       0.701254      -0.155427

GRNBoost2 output

              TF         target  importance
3  1100001G20Rik  1190002H23Rik   54.013678
6  1190002H23Rik  1100001G20Rik   43.372689
4  1110004E09Rik  1190007F08Rik   40.087843
1  0610010K14Rik  1190007F08Rik   34.482150
0  0610007L01Rik  1190007F08Rik   28.552018

Case 2 Expression matrix (shuffling gene order)

       0910001L09Rik  0610010K14Rik  1190007F08Rik  1110004E09Rik  1100001G20Rik  0610007L01Rik  1110013L07Rik  1110007A13Rik  1200002N14Rik  1190002H23Rik
index                                                                                                                                                      
0          -0.696000      -1.184288      -0.424003      -0.586632      -0.190556      -0.636101      -0.269762      -0.775190      -0.155427      -0.617464
1           0.412374      -0.268231      -0.424003       0.887191      -0.190556      -0.636101      -0.269762      -0.775190       4.575706       0.447862
2           0.054724       0.249596      -0.424003       1.142911      -0.190556       0.269204      -0.269762       1.001526      -0.155427      -0.617464
3           0.029340       0.210235      -0.424003       0.377865      -0.190556       0.238592       1.004492       0.516827      -0.155427       1.343563
4          -0.031379       1.200998       0.701254       0.965279      -0.190556      -0.636101      -0.269762      -0.094555      -0.155427       0.504306

GRNBoost2 output

              TF         target  importance
4  1100001G20Rik  1190002H23Rik   52.907365
2  1110004E09Rik  1190007F08Rik   23.055227
0  0910001L09Rik  1190002H23Rik   22.537507
5  0610007L01Rik  1190002H23Rik   18.553030
1  0610010K14Rik  1190007F08Rik   18.310874

Any comments on why this is hapenning?