FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.08k stars 198 forks source link

NCC gives wrong prediction on TCEP? #42

Closed wpzdm closed 4 years ago

wpzdm commented 4 years ago

I test NCC with half TCEP pair for training and half for testing. When testing, I flip all pairs (X2 is cause). However, NCC outputs positive value for all pairs! Code:

def test_NCC():
    from sklearn.model_selection import train_test_split
    tueb, labels = load_dataset('tuebingen')
    method = NCC
    print(method)
    m = method()
    X_tr, X_te, y_tr, y_te = train_test_split(tueb, labels, train_size=.5)
    m.fit(X_tr, y_tr, epochs=10000)
    r = m.predict_dataset(X_te.reindex(columns=['B', 'A']))
    print(r)

Outputs:

0       89.886803
1    42859.230469
2     2996.945312
3   351716.406250
4   218484.812500
5     1456.278320
6      354.131256
7      453.962494
8    29202.076172
9    47342.875000
10    2115.986084
11     175.141602
12    2060.776123
13   10275.829102
14    2584.913574
15    8027.451660
16     637.758789
17   49512.773438
18     794.610840
19     177.425110
20    5133.766602
21    2414.513916
22     205.962494
23     411.851135
24     186.423264
25     880.144958
26     173.254272
27      85.153000
28     758.132324
29     726.009766
30    2010.785767
31    1986.761475
32    1791.590332
33      32.296738
34    2300.482666
35   12707.833008
36   63790.007812
37    4901.006836
38     935.546875
39     232.197510
40    5229.793457
41    2120.424316
42     180.572327
43    2947.156738
44    2176.514160
45    2140.100098
46    6997.687988
47   28182.152344
48     881.467407
49    1656.368042
diviyank commented 4 years ago

It comes from 'load_dataset' that loads the TCEP dataset in always the same causal direction:

>>> labels                                                                               
Out[5]: 
          Target
SampleID        
pair1        1.0
pair2        1.0
pair3        1.0
pair4        1.0
pair5        1.0
pair6        1.0
pair7        1.0
pair8        1.0
pair9        1.0
pair10       1.0
pair11       1.0
pair12       1.0
pair13       1.0
pair14       1.0
pair15       1.0
pair16       1.0
pair17       1.0
pair18       1.0
pair19       1.0
pair20       1.0
pair21       1.0
pair22       1.0
pair23       1.0

You should shuffle the causal order to have a good training.

wpzdm commented 4 years ago

Thank you! I will try and report new results!

Abel

wpzdm commented 4 years ago

Hi, I tried as you suggested, it worked! Thank you! I think this information is better to be documented ;)

Abel

diviyank commented 4 years ago

Noted, I'll add it in the documentation :)

diviyank commented 4 years ago

Should be done! I'll be closing this issue, don't hesitate to reopen it if necessary.