bd2kccd / causal-cmd

16 stars 8 forks source link

Including discrete variables in search #60

Closed jaron-lee closed 3 years ago

jaron-lee commented 3 years ago

I want to run a search algorithm (pc-all, say) on some variables - some are continuous, some are binary, and some are discrete. These discrete variables take values in the set {"a", "b", "c", "d", "e"}. However, when I try to run the following code, I get errors. causal-cmd --algorithm "pc-all" \ --alpha 0.001 \ --data-type "mixed" \ --dataset $DATASET \ --delimiter "comma" \ --knowledge $KNOWLEDGE \ --test "cg-lr-test" \ --json-graph \ --out $OUTPUT \ --prefix $PREFIX --numCategories 8 \ --skip-validation If I leave validation on, I get edu.pitt.dbmi.causal.cmd.ValidationException at edu.pitt.dbmi.causal.cmd.data.DataValidations.validateTabularData( DataValidations.java:121) at edu.pitt.dbmi.causal.cmd.data.DataValidations.validate(DataValidat ions.java:69) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.runTetrad(CausalCmdA pplication.java:128) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.main(CausalCmdApplic ation.java:105)

and if it's off the code fails because it's trying to parse my discrete variable as a continuous one: Exception in thread "main" edu.pitt.dbmi.data.reader.DataReaderException: Non -continuous number """ebec""" on line 2 at column 1. at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.readInCont inuousData(TabularDataFileReader.java:838) at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.read(Tabul arDataFileReader.java:319) at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.read(Tabul arDataFileReader.java:329) at edu.pitt.dbmi.causal.cmd.data.DataFiles.readInTabularData(DataFile s.java:153) at edu.pitt.dbmi.causal.cmd.data.DataFiles.readInDatasets(DataFiles.java:100) at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.runAlgorithm(TetradRu nner.java:74) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.runTetrad(CausalCmdA pplication.java:133) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.main(CausalCmdApplic ation.java:105) Exception in thread "main" edu.pitt.dbmi.data.reader.DataReaderException: Non -continuous number a on line 2 at column 1. at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.readInCont inuousData(TabularDataFileReader.java:838) at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.read(Tabul arDataFileReader.java:319) at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.read(Tabul arDataFileReader.java:329) at edu.pitt.dbmi.causal.cmd.data.DataFiles.readInTabularData(DataFile s.java:153) at edu.pitt.dbmi.causal.cmd.data.DataFiles.readInDatasets(DataFiles.java:100) at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.runAlgorithm(TetradRu nner.java:74) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.runTetrad(CausalCmdA pplication.java:133) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.main(CausalCmdApplic ation.java:105)

Is there some way I can indicate the column types to causal-cmd? I wasn't able to locate such an option having spent this afternoon searching. Many thanks in advance.

kvb2univpitt commented 3 years ago

@jaron-lee Yes, there's a way. You will need to create a metadata file (in json format) which will override the default types. Use the --metadata parameter to point to where the metadata file is.

Assume you have the following tab-delimited data:

Gender  Height  Grade
m   5.3 a
f   5.7 b
f   6   b
f   5.3 c
m   5.5 c
m   5.9 a

The metadata file (metadata.json) would look like this:

{
  "domains" : [ {
    "name" : "Gender",
    "discrete" : true
  } ,
  {
    "name" : "Height",
    "discrete" : false
  } ,
  {
    "name" : "Grade",
    "discrete" : true
  } ]
}

The datatype should be discrete. Below is an example command using the dataset and metadata above:

java -jar causal-cmd-1.2.1.jar --algorithm pc-all --dataset data/test_data.txt --delimiter tab --data-type discrete --test cg-lr-test --metadata data/metadata.json

Hope that helps.

jaron-lee commented 3 years ago

This is exactly what I need! Thank you.