exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
202 stars 55 forks source link

Remove Settings and cli commands other than --analysis and --analysis-batch #241

Closed julesjacobsen closed 6 years ago

julesjacobsen commented 6 years ago

The cli options have become too numerous and are limited to the original exomiser algorithm only. They also use a parallel code path to the analysis API which means implementing new features requiring user input requires more than twice as much coding, testing and more opportunities for new bugs as well as inconsistent functionality/features depending on whether people use the analysis scripts or the cli switches.

For the next major version - 10.0.0 I'd like to propose completely dropping the cli commands as this will significantly reduce the burden of dead code and legacy logic. We've supported this for over three major versions (7.0.0-9.0.0), in which time the functionality of the analysis API has completely eclipsed the cli switches and provides a huge amount of added flexibility. It looks brutal, but it should help simplify things for both the user, the maintainers and also aid reproducibility as the script becomes the single point of entry to the analysis.

@damiansm, @pnrobinson - comments please.

So we'll keep these:

    --analysis <file>                    Path to analysis script file.
                                         This should be in yaml format.
    --analysis-batch <file>              Path to analysis batch file. This
                                         should be in plain text file with
                                         the path to a single analysis
                                         script file in yaml format on
                                         each line.
    -h,--help                               Shows this help

And remove all these:

    --batch-file <file>                  Path to batch file. This should
                                         contain a list of fully qualified
                                         path names for the settings files
                                         you wish to process. There should
                                         be one file name on each line.
    --candidate-gene <arg>               Gene symbol of known or suspected
                                         gene association e.g. FGFR2
 -D,--disease-id <arg>                   OMIM ID for disease being
                                         sequenced. e.g. OMIM:101600
 -E,--hiphive-params <type>              Comma separated list of optional
                                         parameters for hiphive: human,
                                         mouse, fish, ppi. e.g.
                                         --hiphive-params=human or
                                         --hiphive-params=human,mouse,ppi
 -F,--max-freq <arg>                     Maximum frequency threshold for
                                         variants to be retained. e.g.
                                         100.00 will retain all variants.
                                         Default: 100.00
 -f,--out-format <type>                  Comma separated list of format
                                         options: HTML, VCF, TAB-GENE or
                                         TAB-VARIANT,. Defaults to HTML if
                                         not specified. e.g.
                                         --out-format=TAB-VARIANT or
                                         --out-format=TAB-GENE,TAB-VARIANT
                                         ,HTML,VCF
    --full-analysis <true/false>         Run the analysis such that all
                                         variants are run through all
                                         filters. This will take longer,
                                         but give more complete results.
                                         Default is false
    --genes-to-keep <HGNC gene symbol>   Comma separated list of seed
                                         genes (HGNC gene symbols e.g.
                                         FGFR2) for filtering
 -h,--help                               Shows this help
 -H,--help                               Shows this help
    --hpo-ids <HPO ID>                   Comma separated list of HPO IDs
                                         for the sample being sequenced
                                         e.g.
                                         HP:0000407,HP:0009830,HP:0002858
 -I,--inheritance-mode <arg>             Filter variants for inheritance
                                         pattern (AR, AD, X)
    --num-genes <arg>                    Number of genes to show in output
 -o,--out-prefix <arg>                   Out file prefix. Will default to
                                         vcf-filename-exomiser-results
    --output-pass-variants-only          Only write out PASS variants in
                                         TSV and VCF files.
 -p,--ped <file>                         Path to pedigree (ped) file.
                                         Required if the vcf file is for a
                                         family.
 -P,--keep-non-pathogenic                Keep the predicted non-pathogenic
                                         variants that are normally
                                         removed by default. These are
                                         defined as syonymous, intergenic,
                                         intronic, upstream, downstream or
                                         intronic ncRNA variants. This
                                         setting can optionally take a
                                         true/false argument. Not
                                         including the argument is
                                         equivalent to specifying 'false'.
    --prioritiser <name>                 Name of the prioritiser used to
                                         score the genes. Can be one of:
                                         hiphive, phenix, phive,
                                         exomewalker, omim or none. e.g.
                                         --prioritiser=none
    --proband <arg>                      Sample name of the proband. This
                                         should be present in both the ped
                                         and vcf files. Required if the
                                         vcf file is for a family.
 -Q,--min-qual <arg>                     Mimimum quality threshold for
                                         variants as specifed in VCF
                                         'QUAL' column.  Default: 0
 -R,--restrict-interval <arg>            Restrict to region/interval
                                         (e.g., chr2:12345-67890)
    --remove-failed                      Calling this option will tell
                                         Exomiser to ignore any variants
                                         marked in the input VCF as having
                                         failed any previous filters from
                                         other upstream analyses. In other
                                         words, unless a variant has a
                                         'PASS' or '.' in the FILTER field
                                         of the input VCF, it will be
                                         excluded from the analysis by the
                                         Exomiser.
    --remove-known-variants              Filter out all variants with an
                                         entry in dbSNP/ESP/ExAC
                                         (regardless of frequency).
 -S,--seed-genes <Entrez geneId>         Comma separated list of seed
                                         genes (Entrez gene IDs) for
                                         random walk
    --settings-file <file>               Path to settings file. Any
                                         settings specified in the file
                                         will be overidden by parameters
                                         added on the command-line.
 -T,--keep-off-target                    Keep the off-target variants that
                                         are normally removed by default.
                                         These are defined as intergenic,
                                         intronic, upstream, downstream or
                                         intronic ncRNA variants. This
                                         setting can optionally take a
                                         true/false argument. Not
                                         including the argument is
                                         equivalent to specifying 'true'.
 -v,--vcf <file>                         Path to VCF file with mutations
                                         to be analyzed. Can be either for
                                         an individual or a family.
pnrobinson commented 6 years ago

This seems quite reasonable to me. I would also suggest we create a JavaFX app to run the Exomiser that would guide users through the development of a reasonable set of settings. I will profile something like that and we can see how it looks.

julesjacobsen commented 6 years ago

Phew! Glad you agree. We could also add a couple of presets for people to choose from based on the demos in the cli distribution e.g. exome and genome these could take the form:

steps: [ 
        exomePreset: {}
    ]

or

steps: [ 
       genomePreset: {}
    ]
damiansm commented 6 years ago

I vote yes as well. A long time since I used them and too complicated to use them nowadays me thinks with all the new options

On Tue, Jan 2, 2018 at 4:49 PM, Jules Jacobsen notifications@github.com wrote:

Phew! Glad you agree. We could also add a couple of presets for people to choose from based on the demos in the cli distribution e.g. exome https://github.com/exomiser/Exomiser/blob/master/exomiser-cli/src/main/resources/examples/test-analysis-exome.yml and genome https://github.com/exomiser/Exomiser/blob/master/exomiser-cli/src/main/resources/examples/test-analysis-genome.yml these could take the form:

steps: [ exomePreset: {} ]

or

steps: [ genomePreset: {} ]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/241#issuecomment-354812349, or mute the thread https://github.com/notifications/unsubscribe-auth/AE7uPHCKiBbP5PNbU5Mfec8eWVms_cHmks5tGl4cgaJpZM4RQuRl .

pnrobinson commented 6 years ago

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

julesjacobsen commented 6 years ago

Done in version 10.0.0