Closed agitter closed 2 years ago
This blog post gives a thorough overview of different file types and associated parsers that are often used for config files: https://hackersandslackers.com/simplify-your-python-projects-configuration/
Using YAML as an example, we could have a config file like
tps:
network: data/networks/input-network.tsv
timeseries: data/timeseries/median-time-series.tsv
firstscores: data/timeseries/p-values-first.tsv
...
cytoscape: /home/seluser/cytoscape/start.sh
Snakemake uses a YAML or JSON config file: https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#standard-configuration
If we use YAML, it would make it easier to switch from a script-driven workflow to a Snakemake workflow later.
Here is a list of all of the parameter files
--network <file>
: Input network file in TSV format, where each row defines an undirected edge.--timeseries <file>
: Input time series file in TSV format. The first line defines the time point labels, and each subsequent line corresponds to one time series profile.--firstscores <file>
: Input file that contains significance scores for each time point of a profile (except the first time point), with respect to the first time point of the profile.--prevscores <file>
: Similar to --firstscores
, an input file that gives significance scores for each time point (except the first one), with respect to the previous time point.--source <value>
: Identifier for the network source node. Multiple source nodes can be provided by repeating the argument multiple times. For example, --source <node1> --source <node2> --source <node3>
.--threshold <value>
: Threshold value for significance scores, above which measurements are considered non-significant.output.sif
style file
cytoscape session file name
annotations data types file
Cytoscape path
Thanks, it's very helpful to see all of these listed explicitly. The sheer number makes me prefer the config file option even more. That would be a lot of required arguments to supply at the command line.
Some of these are also redundant in the sense that the same input file is used in two different stages (e.g. timeSeriesFile) or the output of one stage is consumed as input by another stage.
We can also think more about setting reasonable defaults. For instance, most users won't need to specify a custom styleTemplateFile.
One example from the Manubot project of using subprocess and passing arguments: https://github.com/manubot/manubot/blob/217e51473f1fd1c6427803676b3c70d44314bb93/manubot/pandoc/bibliography.py
We discussed using a config or properties file to track all of the input files and settings that a user needs to specify. That would greatly reduce the number of command line arguments needed.
A YAML file could be one option. We should look at what other modern software uses.