SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

Inconsistent method of separating network condition arguments #122

Closed JohnHadish closed 4 years ago

JohnHadish commented 4 years ago

The method for separating multiple conditions is inconsistent between commands.

For example: In run extract --filter-rsquare, multiple conditions are separated with ::

--filter-rsquare <value>
Value Type: String
Default Value: 0.3
This is only used if a Condition-Specific Martrix is provided above and applies
to quantitative and ordinal tests. This filters the network such that only edges
(clusters) with r-squared values from liner regression testing above the given
values are kept. Provide a single r-squared value to filter all features with
the same value. However, you can specify different r-squared values for
different features. For example, suppose you were tesing a feature named
'Weight' and you wanted edges with an r-squared value > 0.5, you would input
"Weight,0.5". You can provide any number of filters but they must be separated
using two colons: "::".

Where as in run cond-test --feat-types, multiple conditions are separated with , (In addition, having the comma as a separator is not explicitly stated in the documentation, but was determined through trial and error)

--feat-types <value>
Value Type: String
By default, this program will automatically detect the type of feature as
'categorical', 'quantitative', or 'ordinal'. You can override the default type
by listing the column name from the annotation matrix, followed by a colon and
then the desired type. You can list as many features by separating them with
commas, with no spaces around commas. For example if a column is named
"Health_Status" and is numeric with an ordinal enter: Health_Status:ordinal

I suggest that a standard separator for conditions be picked to reduce confusion

bentsherman commented 4 years ago

The issue is that in the first case the filters also have a threshold and the comma is used to separate filter name and filter threshold. I would recommend something like this:

--filter-rsquare filter1=t1,filter2=t2, ....
--feat-types type1,type2, ...

But I see from #121 that the --filter-pvalue option could contain commas in the name, and you want to use commas to delineate different types of operators. So I think the way it is now is fine, especially if it matches the syntax from KINC.R (which I don't know if it does).

Also the command-line help for --feat-types does tell you to use commas:

You can list as many features by separating them with commas, with no spaces around commas.
spficklin commented 4 years ago

I agree with @bentsherman that it's okay to have different delimiters in these different analytics. The only way we could unify them would be to put two colons on the cond-test --feat-types and I'm leary to make such a change now that everything is working! I vote we leave as is.