GS/MMPC algorithm removed all non-related variable

FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.

https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html

MIT License

1.08k stars 198 forks source link

GS/MMPC algorithm removed all non-related variable #60

Closed Cby19961020 closed 4 years ago

Cby19961020 commented 4 years ago

Hi there, This is my very first post on Github so parden me if this is a simple fix to my problem.

I am currently exploring the different graph inference algorithms on my own dataset. One thing I realized is that when implementing algorithm like GS/MMPC the uncorrelated variable is not shown(automatically removed) in the nx.adjacency_matrix(output).todense() command.

For example I fed 50 variables into the MMPC algorithm and out of which only 35 variables are correlated and 15 are not. The nx.adjacency_matrix(output).todense() will only spit out the matrix for 35 variables and I do not know the variables that are uncorrelated and removed.

While CDT does provide plotting option I do perfer to use package like Graphviz. Thus it will be helpful to acquire the matrix for all 50 input variables instead.

Is there a way for me to obtain such matrix? Thank you in advance!

diviyank commented 4 years ago

Hello @Cby19961020 and thanks you for noticing this point. This behavior is not what we want: even if a variable is uncorrelated, we still want it to appear in the adjacency matrix ! I will look into it this evening. Best regards, Diviyan

diviyank commented 4 years ago

After checking, it doesn't seem to be the x: nx.adjacency_matrix(x).todense() that is causing the issue, but from the algorithms themselves ?

Cby19961020 commented 4 years ago

Hi Diviyan @Diviyan-Kalainathan ,

Thank you very much for your promot respond! I think you are right, maybe it is the algorithms that I am using.

Essentially the data I am working with is very similar to the "sachs" data used in the tutorial. Here is a screenshot. I have 54 input variables.

When I implement algorithms like GS or MMPC to my dataset I will only end up with 32 variables. These variables all have some sort of connection with other variables, as indiciated in the graph. The onces that are completely independent are ignored.

The nx.adjacency_matrix(output).todense() function will generate a matrix with no label, in this case a 32 by 32 matrix is generated. However based on just this I cannot figure out which one is variable Task_0 and which one is Task_1 etc.

Sorry for not stating the problem clearly and thank you for your hard work!

Best Regards, Bo

diviyank commented 4 years ago

Hi, Don't be sorry! This bug was unexpected; it should be fixed in 0.5.17 ! Best, Diviyan