aCLImatise / CliHelpParser

Reads the output from CLI help commands, and generates machine readable schemas (CWL etc)
https://aclimatise.github.io/CliHelpParser/
GNU General Public License v3.0
14 stars 5 forks source link
bioinformatics cli cwl parser pipeline wdl workflow

aCLImatise


|DOI|

.. |DOI| image:: https://zenodo.org/badge/DOI/10.1093/bioinformatics/btaa1033.svg :target: https://doi.org/10.1093/bioinformatics/btaa1033

For the full documentation, refer to the Github Pages Website <https://aclimatise.github.io/CliHelpParser/>_.

======================================================================

aCLImatise is a Python library and command-line utility for parsing the help output of a command-line tool and then outputting a description of the tool in a more structured format, for example a Common Workflow Language tool definition <https://www.commonwl.org/v1.1/CommandLineTool.html>_.

Currently aCLImatise supports both CWL <https://www.commonwl.org/> and WDL <https://openwdl.org/> outputs, but other formats will be considered in the future, especially pull requests to support them.

Please also refer to The aCLImatise Base Camp <https://aclimatise.github.io/BaseCamp/>_, which is a database of pre-computed tool definitions generated by the aCLImatise parser. Most bioinformatics tools have a tool definition already generated in the Base Camp, so you may not need to run aCLImatise directly.

aCLImatise is now published in the journal Bioinformatics. You can read the application note here: https://doi.org/10.1093/bioinformatics/btaa1033. To cite aCLImatise, please use the citation generator provided by the journal.

Example

Lets say you want to create a CWL workflow containing the common Unix wc (word count) utility. Running wc --help returns:

.. code-block::

Usage: wc [OPTION]... [FILE]... or: wc [OPTION]... --files0-from=F Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space.

With no FILE, or when FILE is -, read standard input.

The options below may be used to select which counts are printed, always in the following order: newline, word, character, byte, maximum line length. -c, --bytes print the byte counts -m, --chars print the character counts -l, --lines print the newline counts --files0-from=F read input from the files specified by NUL-terminated names in file F; If F is - then read names from standard input -L, --max-line-length print the maximum display width -w, --words print the word counts --help display this help and exit --version output version information and exit

GNU coreutils online help: http://www.gnu.org/software/coreutils/ Full documentation at: http://www.gnu.org/software/coreutils/wc or available locally via: info '(coreutils) wc invocation'

If you run aclimatise explore wc, which means "parse the wc command and all subcommands", you'll end up with the following files in your current directory:

These are representations of the command wc in 3 different formats. If you look at wc.wdl, you'll see that it contains a WDL-compatible tool definition for wc:

.. code-block:: text

version 1.0
task Wc {
  input {
    Boolean bytes
    Boolean chars
    Boolean lines
    String files__from
    Boolean max_line_length
    Boolean words
  }
  command <<<
    wc \
      ~{true="--bytes" false="" bytes} \
      ~{true="--chars" false="" chars} \
      ~{true="--lines" false="" lines} \
      ~{if defined(files__from) then ("--files0-from " +  '"' + files__from + '"') else ""} \
      ~{true="--max-line-length" false="" max_line_length} \
      ~{true="--words" false="" words}
  >>>
}