cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.
GNU General Public License v2.0
402 stars 110 forks source link

Running causal-cmd with FGES and the BDeu score always finds the empty graph #1291

Closed felixleopoldo closed 2 years ago

felixleopoldo commented 4 years ago

I am running the FGES algorithm with the BDeu score on a discrete data set stemming from a non-emtpy Bayesian network (see attached file). When I run the command below BIC is 0 and no edges are found so I suspect that there is some bug.

java -jar causal-cmd-1.1.2-jar-with-dependencies.jar --algorithm fges --data-type discrete --dataset data_n_100_p_50_avpar_2_1.txt --delimiter comma --score bdeu-score --quote-char \" [fas, fask, fci, fges, fges-mb, fofc, ftfc, gfci, glasso, imgs_cont, imgs_disc, lingam, mbfs, mgm, multi-fask, pc-all, r-skew, r3, rfci, rfci-bsc, skew, ts-fci, ts-gfci, ts-imgs] [fas, fask, fci, fges, fges-mb, fofc, ftfc, gfci, glasso, imgs_cont, imgs_disc, lingam, mbfs, mgm, multi-fask, pc-all, r-skew, r3, rfci, skew, ts-fci, ts-gfci, ts-imgs] [rfci-bsc] Running version 1.1.2 which is the latest version. To disable checking use the skip-latest option. Jul 15, 2020 2:30:02 PM java.util.prefs.FileSystemPreferences$6 run WARNING: Prefs file removed in background /home/felix/.java/.userPrefs/prefs.xml

The output is given below

================================================================================ FGES (Wed, July 15, 2020 02:28:35 PM)

Runtime Parameters

number of threads: 7

Dataset

file: data_n_100_p_50_avpar_2_1.txt header: yes delimiter: comma quote char: " missing marker: none comment marker: none

Algorithm Run

algorithm: FGES score: BDeu Score

Algorithm Parameters

faithfulnessAssumed: no maxDegree: 100 symmetricFirstStep: no verbose: no

Wed, July 15, 2020 02:28:35 PM: Start data validation on file data_n_100_p_50_avpar_2_1.txt. Wed, July 15, 2020 02:28:35 PM: End data validation on file data_n_100_p_50_avpar_2_1.txt. There are 100 cases and 50 variables.

Wed, July 15, 2020 02:28:35 PM: Start reading in file data_n_100_p_50_avpar_2_1.txt. Wed, July 15, 2020 02:28:35 PM: Finished reading in file data_n_100_p_50_avpar_2_1.txt. Wed, July 15, 2020 02:28:35 PM: File data_n_100_p_50_avpar_2_1.txt contains 100 cases, 50 variables.

Start search: Wed, July 15, 2020 02:28:35 PM End search: Wed, July 15, 2020 02:28:35 PM

================================================================================ Graph Nodes: V1;V2;V3;V4;V5;V6;V7;V8;V9;V10;V11;V12;V13;V14;V15;V16;V17;V18;V19;V20;V21;V22;V23;V24;V25;V26;V27;V28;V29;V30;V31;V32;V33;V34;V35;V36;V37;V38;V39;V40;V41;V42;V43;V44;V45;V46;V47;V48;V49;V50

Graph Edges:

Graph Attributes: BIC: 0.000000

data_n_100_p_50_avpar_2_1.txt

kvb2univpitt commented 4 years ago

@felixleopoldo I tested this on the latest version causal-cmd-1.1.3, and it worked. Please use the latest version. Also, the parameter structure prior is set to zero by default. It shouldn't be. You will need to provide the parameter for the structure prior. Below is an example command that will work with the latest version:

java -jar causal-cmd-1.1.3-jar-with-dependencies.jar --algorithm fges --data-type discrete --dataset data_n_100_p_50_avpar_2_1.txt --delimiter comma --quote-char \" --score bdeu-score --structurePrior 1

Here's the output that I got

================================================================================
FGES (Thu, July 16, 2020 11:46:16 AM)
================================================================================

Runtime Parameters
--------------------------------------------------------------------------------
number of threads: 11

Dataset
--------------------------------------------------------------------------------
file: data_n_100_p_50_avpar_2_1.txt
header: yes
delimiter: comma
quote char: "
missing marker: none
comment marker: none

Algorithm Run
--------------------------------------------------------------------------------
algorithm: FGES
score: BDeu Score

Algorithm Parameters
--------------------------------------------------------------------------------
faithfulnessAssumed: no
maxDegree: 100
samplePrior: 15.0
structurePrior: 1
symmetricFirstStep: no
verbose: no

Thu, July 16, 2020 11:46:16 AM: Start data validation on file data_n_100_p_50_avpar_2_1.txt.
Thu, July 16, 2020 11:46:16 AM: End data validation on file data_n_100_p_50_avpar_2_1.txt.
There are 100 cases and 50 variables.

Thu, July 16, 2020 11:46:16 AM: Start reading in file data_n_100_p_50_avpar_2_1.txt.
Thu, July 16, 2020 11:46:16 AM: Finished reading in file data_n_100_p_50_avpar_2_1.txt.
Thu, July 16, 2020 11:46:16 AM: File data_n_100_p_50_avpar_2_1.txt contains 100 cases, 50 variables.

Start search: Thu, July 16, 2020 11:46:16 AM
End search: Thu, July 16, 2020 11:46:16 AM

================================================================================
Graph Nodes:
V1;V2;V3;V4;V5;V6;V7;V8;V9;V10;V11;V12;V13;V14;V15;V16;V17;V18;V19;V20;V21;V22;V23;V24;V25;V26;V27;V28;V29;V30;V31;V32;V33;V34;V35;V36;V37;V38;V39;V40;V41;V42;V43;V44;V45;V46;V47;V48;V49;V50

Graph Edges:
1. V10 --- V35
2. V11 --- V10
3. V13 --> V17
4. V13 --> V29
5. V14 --- V12
6. V14 --- V8
7. V18 --> V25
8. V2 --> V7
9. V22 --> V25
10. V25 --> V7
11. V26 --- V44
12. V27 --> V37
13. V29 --> V31
14. V30 --- V14
15. V33 --- V19
16. V34 --> V38
17. V34 --- V4
18. V36 --- V8
19. V37 --> V21
20. V38 --> V1
21. V4 --> V1
22. V41 --- V15
23. V42 --> V13
24. V42 --- V6
25. V42 --- V9
26. V43 --- V4
27. V44 --- V24
28. V45 --- V23
29. V47 --- V18
30. V48 --- V34
31. V49 --- V28
32. V5 --> V13
33. V50 --> V37
34. V6 --> V38

Graph Attributes:
BIC: 0.000000

By the way, if you would like to see all the options without running the program just add in the --help flag. For an example:

java -jar causal-cmd-1.1.3-jar-with-dependencies.jar --algorithm fges --data-type discrete --dataset data/data_n_100_p_50_avpar_2_1.txt --delimiter comma --quote-char \" --score bdeu-score --help
felixleopoldo commented 4 years ago

Thank you! Now it works. I also tried the --help flag but it does not seem to give the possible / optional / default values for each algorithm. Is it possible to get this information in some way? I'm for example trying to run fci at the moment and wonder about the possible parameters.

Thanks in advance Felix

kvb2univpitt commented 3 years ago

@felixleopoldo The --help flag should do the trick. The only parameters that are missing are the bootstrap parameters, which I have no idea how that got excluded. The missing bootstrap parameters need to be fixed.

I ran the following command: java -jar causal-cmd-1.1.3-jar-with-dependencies.jar --help --dataset data/data_n_100_p_50_avpar_2_1.txt --delimiter comma --quote-char \" --data-type discrete --algorithm fci --test bdeu-test

I got the following output:

usage: java -jar causal-cmd-1.1.3.jar --algorithm fci --data-type discrete --dataset data/data_n_100_p_50_avpar_2_1.txt --delimiter comma --quote-char " --test bdeu-test [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker <string>] [--completeRuleSetUsed] [--depth <integer>] [--exclude-var <file>] [--experimental] [--extract-struct-model] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge <file>] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxPathLength <integer>] [--metadata <file>] [--missing-marker <string>] [--no-header] [--out <directory>] [--prefix <string>] [--samplePrior <double>] [--skip-latest] [--skip-validation] [--structurePrior <double>] [--thread <string>] [--verbose]
    --choose-dag-in-pattern        Choose DAG in Pattern graph.
    --choose-mag-in-pag            Choose MAG in PAG.
    --comment-marker <string>      Comment marker.
    --completeRuleSetUsed          Yes if the complete FCI rule set should be used
    --depth <integer>              Maximum size of conditioning set (unlimited = -1)
    --exclude-var <file>           Variables to be excluded from run.
    --experimental                 Show experimental algorithms, tests, and scores.
    --extract-struct-model         Extract sturct model.
    --generate-complete-graph      Generate complete graph.
    --genereate-pag-from-dag       Generate PAG from DAG.
    --genereate-pag-from-tsdag     Generate PAG from TsDAG.
    --genereate-pattern-from-dag   Generate pattern graph from PAG.
    --json-graph                   Write out graph as json.
    --knowledge <file>             Prior knowledge file.
    --make-all-edges-undirected    Make all edges undirected.
    --make-bidirected-undirected   Make bidirected edges undirected.
    --make-undirected-bidirected   Make undirected edges bidirected.
    --maxPathLength <integer>      The maximum length for any discriminating path. -1 if unlimited (min = -1)
    --metadata <file>              Metadata file.  Cannot apply to dataset without header.
    --missing-marker <string>      Denotes missing value.
    --no-header                    Indicates tabular dataset has no header.
    --out <directory>              Output directory
    --prefix <string>              Output file name prefix.
    --samplePrior <double>         Sample prior (min = 1.0)
    --skip-latest                  Skip checking for latest software version.
    --skip-validation              Skip validation.
    --structurePrior <double>      Structure prior coefficient (min = 0.0)
    --thread <string>              Number threads.
    --verbose                      Yes if verbose output should be printed or logged

If you compare this with tetrad-gui, you will see all the parameters for the algorithm and test are there: tetrad

felixleopoldo commented 3 years ago

Thanks for your answers!

I also wonder if there is a way to specify the number of categories for each variable? How is it handled if I don't specify it?

kvb2univpitt commented 3 years ago

@felixleopoldo Please see this closed issue https://github.com/bd2kccd/causal-cmd/issues/60 if you want to specify the number of categories for each variable.

If you don't specify it and the datatype is discrete, Tetrad will count the number of distinct values from each variable as the number of categories. If the datatype is mixed, any variables with distinct values of 4 or less is considered discrete. Use the above to prevent Tetrad from assigning discrete variable with 5 or more states as continuous and to prevent continuous data with 4 or less states as discrete for mixed datatype.

jdramsey commented 2 years ago

@kvb2univpitt Can this be closed now?

felixleopoldo commented 2 years ago

Sure, thanks for your help!

felixleopoldo commented 2 years ago

Sorry, I missed that it wasn’t for me..

kvb2univpitt commented 2 years ago

Yes, it can be closed. @felixleopoldo You're the right person to close this issue, since you're the owner of this. :)