SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

Clarification on --minexpr and --nan values #174

Open JohnHadish opened 4 years ago

JohnHadish commented 4 years ago

It is not clear if the user should consider -Inf values as a -nan or as -min value. In the documentation they are present in both locations. Documentation should state how these parameters will impact analysis.

From the Step 1: Import the GEM

... In the example above, the --nan argument indicates that the file uses "NA" to represent missing values. This value should be set to whatever indicates missing values. This could be "0.0", "-Inf", etc. and the GEM file has a header describing each column so the number ...

From the Step 2: Perform Correlation Analysis

... The --minexp argument isset to negative infinity (-inf) to indicate there is no limit on the minimum expression value. If we wanted to exclude samples whose log2 expression values dipped below 0.2, for instance, we could do so with this argument. ...

From the comand line documentation for kinc help run similarity, value is considered a "floating point", but defaults to a string.

--minexpr <value>
Value Type: Floating Point
Minimum Value: -inf
Maximum Value: inf
Default Value: -inf
Minimum threshold for a sample to be included in a gene pair.

--maxexpr <value>
Value Type: Floating Point
Minimum Value: -inf
Maximum Value: inf
Default Value: inf
Maximum threshold for a sample to be included in a gene pair.
bentsherman commented 3 years ago

The notes on --nan are just saying that you can set any value to be parsed as a nan value, for example "NA" or "-inf" or "0.0", depending on your situation.

As for --minexpr and --maxexpr, the IEEE floating point standard has a special value reseved for infinity and nan, so "-inf" and "inf" can be parsed as valid floating point values. I hope that clears up your questions.