exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
190 stars 54 forks source link

Enable specifiying output directory on the CLI #469

Closed julesjacobsen closed 1 year ago

julesjacobsen commented 1 year ago

Background

The current v13.1.0 cli output options are like so

--output <string>          Path to outputOptions file. This should be
                           in JSON or YAML format.
--output-prefix <string>   Path/filename without an extension to be
                           prepended to the output file format
                           options.

The output-options.yml file contains a couple of options:

  outputPrefix: results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
  #out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML, JSON)
  outputFormats: [HTML, JSON, TSV_GENE, TSV_VARIANT, VCF]

outputPrefix will prefix the output file with the fully specified path and combine these with the outputFormats values to create the output files.

So, given the command:

java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --preset exome --output pfeiffer-output-options.yml

These files are produced:

results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.genes.tsv
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.html
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.json
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.variants.tsv
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.vcf.gz
results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.vcf.gz.tbi

This is great if you want to specify a particular output filename to be used. But what if you wanted to have the default output file name (the input VCF filename with '-exomiser' appended) with a non-standard output directory? e.g.

/analysis/analysis-12345/Pfeiffer-exomiser.genes.tsv
/analysis/analysis-12345/Pfeiffer-exomiser.html
/analysis/analysis-12345/Pfeiffer-exomiser.json
/analysis/analysis-12345/Pfeiffer-exomiser.tsv
/analysis/analysis-12345/Pfeiffer-exomiser.vcf.gz
/analysis/analysis-12345/Pfeiffer-exomiser.vcf.gz.tbi

In this case you need to specify the full path and filename in the output-options.yaml file, which is irritating. Allowing users to specify the output directory would be helpful, especially for large batches of analyses.

User story

As a cli user, I wish to use to default exomiser output file name (the input VCF filename with -exomiser appended), but I want to be able to specify a custom output directory directly via the cli, without having to create an output-options.yaml file for each sample.

Option 1 - new CLI option

--output <string>          Path to outputOptions file. This should be
                           in JSON or YAML format.
--output-prefix <string>   Path/filename without an extension to be
                           prepended to the output file format
                           options.
--output-directory <path>  Path to the desired output directory
                           where exomiser will write the output files. Using this
                           without the output-file-name option will result in a default
                           filename being used which will be output to the specified
                           directory.
--output-file-name <string> Filename prefix to be used for the
                           output files. Can be combined with the 
                           output-directory option to specify a custom location
                           and filename. Used alone will result in files with the
                           specified filename being written to the default results
                           directory.
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-directory ~/exomiser-results

These files are produced:

~/exomiser-results/Pfeiffer-exomiser.html
~/exomiser-results/Pfeiffer-exomiser.json

This seems clean and simple and it would allow for adding a companion --output-filename-prefix option and neither/ either/ both options could be used. However, the existing --output-prefix option would need to be used exclusively to the new --output-directory and --output-filename-prefix options.

 --output-prefix || (--output-directory & --output-file-name)

So for a VCF input file Pfeifer.vcf.gz

java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml
 -> results/Pfeifer-exomiser.html, results/Pfeifer-exomiser.json, 
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-directory ~/exomiser-results
 -> ~/exomiser-results/Pfeifer-exomiser.html, ~/exomiser-results/Pfeifer-exomiser.json, 
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-directory ~/exomiser-results --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> ~/exomiser-results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.html, ~/exomiser-results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.json, 
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.html, results/Pfeiffer-hiphive-exome-FULL-all-variant-ACMG.json, 

Illegal options:
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results  --output-directory ~/exomiser-results
 -> IllegalArgumentException
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> IllegalArgumentException
java -jar exomiser-cli13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results  --output-directory ~/exomiser-results --output-file-name Pfeiffer-hiphive-exome-FULL-all-variant-ACMG
 -> IllegalArgumentException

Implementation-wise under the hood this will be a bit of a pain as it will involve adding fields to the OutputOptions class, changing the ResultsWriterUtils and most likely ResultsWriter implementations to cater for these changes.

Option 2 - Use existing CLI --output-prefix option

Given there is already an --output-prefix option, this could be trivially changed so that Exomiser parses the value (a String) as either a file path (current behaviour) or as a directory (new behaviour).

e.g.

java -jar exomiser-cli-13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results

produces

~/exomiser-results.html
~/exomiser-results.json

but appending the system file separator to the --output-prefix argument indicates this is to be interpreted as a directory:

java -jar exomiser-cli-13.1.0.jar --sample pfeiffer-phenopacket.yaml --output-prefix ~/exomiser-results/

produces

~/exomiser-results/Pfeiffer-exomiser.html
~/exomiser-results/Pfeiffer-exomiser.json

So the value of --output-prefix is now a directory path and the file names are generated from the input VCF file name as before. Implementation is a simple change to the ResultsWriterUtils class to better-specify the behaviour of the way the output-prefix argument is interpreted.

Pros and Cons

Option 1 is more explicit and probably (?) less likely to cause confusion, will require API changes and an additional set of commands for the CLI, whereas option 2 is simpler to implement and requires no API changes or CLI changes, at the expense of some possible confusion about the meaning of output-prefix which can do double-duty.

@yaseminbridges, have you got any preference?

yaseminbridges commented 1 year ago

No preferences for me, I feel like both options are clear enough in how to use the feature especially if it is documented as it is here. So whatever you feel is a good fit I am happy with, I would be able to work with both approaches!

julesjacobsen commented 1 year ago

Went with the split option of outputDirectory and outputFileName