bd2kccd / causal-cmd

16 stars 8 forks source link

Warning: pref file removed #71

Closed Zarmas closed 2 years ago

Zarmas commented 2 years ago

Hello, when trying to run the following code, I get the following message:

$ java -jar causal-cmd-1.4.0-SNAPSHOT-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset input.txt --delimiter tab --score sem-bic-score Jul 08, 2022 1:29:42 PM java.util.prefs.FileSystemPreferences$6 run WARNING: Prefs file removed in background /home//.java/.userPrefs/prefs.xml

I get my results in a .txt as expected. The input.txt contains 100 variables with 35000 samples. I can also see in the resulting file that the number of threads is 7, but when using the top command, I see only one of the CPUs being used, and I wonder if there is a connection with the previous warning. The following are the java, ubuntu and maven versions used on my system.

$ java -version openjdk version "18-ea" 2022-03-22 OpenJDK Runtime Environment (build 18-ea+36-Ubuntu-1) OpenJDK 64-Bit Server VM (build 18-ea+36-Ubuntu-1, mixed mode, sharing)

$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04 LTS Release: 22.04 Codename: jammy

$ mvn -version Apache Maven 3.6.3 Maven home: /usr/share/maven Java version: 18-ea, vendor: Private Build, runtime: /usr/lib/jvm/java-18-openjdk-amd64 Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "5.10.102.1-microsoft-standard-wsl2", arch: "amd64", family: "unix"

Is there a way to fix this problem? If the threads problem is unrelated to the warning, is there a way to make sure all my CPUs are used so I can get my results faster?

Thank you in advance,

George

kvb2univpitt commented 2 years ago

@Zarmas The warning you got is a known bug in OpenJDK 8: https://bugs.openjdk.org/browse/JDK-8068373.

Is there a way to fix this problem?

If the warning bothers you, you can use Java 11. The warning is harmless and has no effect on causal-cmd.

If the threads problem is unrelated to the warning, is there a way to make sure all my CPUs are used so I can get my results faster?

Not all algorithms can run in parallel. There's no option in causal-cmd to set the number of threads. Depends on the algorithm, certain parameter settings can make the algorithm run faster. I suggest you use the --default switch to use the default parameter values that's optimized for the algorithm, test, and score.

kvb2univpitt commented 2 years ago

@Zarmas As an update to the discussion on using more than one CPU, there's a switch --parallelized you can use for fges to speed up certain calculation. This switch may not be available for other algorithms. You can use the ---help switch to see if that option is available for a specific algorithm. For an example, if you add the ---help switch to the following command:

java -jar causal-cmd.jar --algorithm fges --data-type continuous --dataset continuous_data.txt --default --delimiter tab --score sem-bic-score --help

You will see the --parallelized option for fges:

usage: java -jar causal-cmd.jar --algorithm fges --data-type continuous --dataset continuous_data --default --delimiter tab --score sem-bic-score [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker <string>] [--exclude-var <file>] [--experimental] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge <file>] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree <integer>] [--meekVerbose] [--metadata <file>] [--missing-marker <string>] [--no-header] [--numberResampling <integer>] [--parallelized] [--penaltyDiscount <double>] [--percentResampleSize <integer>] [--precomputeCovariances] [--prefix <string>] [--quote-char <character>] [--resamplingEnsemble <integer>] [--resamplingWithReplacement] [--semBicRule <integer>] [--semBicStructurePrior <double>] [--skip-validation] [--symmetricFirstStep] [--timeLag <integer>] [--verbose]
    --addOriginalDataset              Yes, if adding the original dataset as another bootstrapping
    --choose-dag-in-pattern           Choose DAG in Pattern graph.
    --choose-mag-in-pag               Choose MAG in PAG.
    --comment-marker <string>         Comment marker.
    --exclude-var <file>              Variables to be excluded from run.
    --experimental                    Show experimental algorithms, tests, and scores.
    --extract-struct-model            Extract sturct model.
    --faithfulnessAssumed             Yes if (one edge) faithfulness should be assumed
    --generate-complete-graph         Generate complete graph.
    --genereate-pag-from-dag          Generate PAG from DAG.
    --genereate-pag-from-tsdag        Generate PAG from TsDAG.
    --genereate-pattern-from-dag      Generate pattern graph from PAG.
    --json-graph                      Write out graph as json.
    --knowledge <file>                Prior knowledge file.
    --make-all-edges-undirected       Make all edges undirected.
    --make-bidirected-undirected      Make bidirected edges undirected.
    --make-undirected-bidirected      Make undirected edges bidirected.
    --maxDegree <integer>             The maximum degree of the graph (min = -1)
    --meekVerbose                     Yes if verbose output for Meek rule applications should be printed or logged
    --metadata <file>                 Metadata file.  Cannot apply to dataset without header.
    --missing-marker <string>         Denotes missing value.
    --no-header                       Indicates tabular dataset has no header.
    --numberResampling <integer>      The number of bootstraps/resampling iterations (min = 0)
    --parallelized                    Yes if the search should be parallelized
    --penaltyDiscount <double>        Penalty discount (min = 0.0)
    --percentResampleSize <integer>   The percentage of resample size (min = 0.1)
    --precomputeCovariances           True if covariance matrix should be precomputed for tubular continuous data
    --prefix <string>                 Output file name prefix.
    --quote-char <character>          Single character denotes quote.
    --resamplingEnsemble <integer>    Ensemble method: Preserved (1), Highest (2), Majority (3)
    --resamplingWithReplacement       Yes, if sampling with replacement (bootstrapping)
    --semBicRule <integer>            Lambda: 1 = Chickering, 2 = Nandy
    --semBicStructurePrior <double>   Structure Prior for SEM BIC (default 0)
    --skip-validation                 Skip validation.
    --symmetricFirstStep              Yes if the first step step for FGES should do scoring for both X->Y and Y->X
    --timeLag <integer>               A time lag for time series data, automatically applied (zero if none)
    --verbose                         Yes if verbose output should be printed or logged

You won't see this option for gfci, for an example.