Fix examples/Discovery_LUCAS.ipynb - Githubissues

FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.

https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html

MIT License

1.12k stars 198 forks source link

Fix examples/Discovery_LUCAS.ipynb #15

Closed jayavanth closed 5 years ago

jayavanth commented 5 years ago

Cgnn.predict(data, graph=ugraph, nb_runs=16, train_epochs=1500, test_epochs=1000) CGNN predict function doesn't accept nb_runs, train_epochs and test_epochs anymore. It has to be called like this:

Cgnn = CGNN(nb_runs=16, train_epochs=1500, test_epochs=1000) Cgnn.predict(data, graph=ugraph)

diviyank commented 5 years ago

Yes, I should fix the example ! Thanks for the feedback !

gkericks commented 5 years ago

I'm not sure this is related, but I am looking for an explanation of how the NUM_LUCAS.csv file was generated and can't find it. Do you have that listed somewhere?

diviyank commented 5 years ago

Hi, Actually, NUM_LUCAS.csv was generated using the cdt.generators.AcyclicGraphGenerator class, by feeding it a ground truth graph. But yes, it doesn't make much sense to call it LUCAS, since it doen't have much to do with the true dataset except for the variables names and the graph structure, I should change that. I will add it on the next version Best. Diviyan

gkericks commented 5 years ago

@Diviyan-Kalainathan Thanks for the quick reply!

Okay so from that I see now that the example is about recreating the answer graph just using examples sampled from it. The original LUCAS data is all binary and this new dataset assumes guassians at every node (the sampled data looks standardized). That being said, what constraints on the data input are there for effectively using your library?

I have a causal problem I am trying to solve and like most real-world data, the input is of mixed types. Some numerical, some categorical. Would you still recommend your library for exploring the dependencies or should I be looking for a different technique? I apologize in advance if that is already covered in your README and I just missed it.

diviyank commented 5 years ago

Hi, There are no constraints on the data input for the library. Instead, it depends on the algorithms from the package. For example, SAM and CGNN accept only numerical data, whereas PC can accept categorical data. For mixed types, I don't know of an algorithm or statistical test that is quite efficient ; I think your best bet would be to discretize your data and use an algorithm/test for categorical data (PC/ GES ).

Best regards, Diviyan

diviyank commented 5 years ago

It should be fixed, sorry for the delay, but we really wanted to fix all the issues on dataset management before fixing this issue. Please keep me updated. Best, Diiviyan

diviyank commented 5 years ago

I will be closing this issue, as it should be solved. Don't hesitate to reopen it if the bug still persists in the latest version. Best, Diviyan