Closed buske closed 9 years ago
I suspect that the vcf is also substituted, just that in this case vcf is the same as what's being substituted.
I can't replicate this from what you've given me.
In order to replicate the issue I created the following folder hierarchy:
C:\Users\jj8\Documents\test\vcf\pfeiffer\
C:\Users\jj8\Documents\test\vcf\pfeiffer\results\
This contains the file Pfeiffer.vcf
I analyse the vcf using this command:
java -Xms3G -Xmx4G -jar .\exomiser-cli-6.0.0.jar --vcf C:\Users\jj8\Documents\test\vcf\pfeiffer\Pfeiffer.vcf --prioritiser phive --out-file=C:\Users\jj8\Documents\test\vcf\pfeiffer\results\pfeiffer --out-format=HTML,VCF,TSV-GENE,TSV-VARIANT
and lo, when the analysis has finished I have the following files:
$ ls C:\Users\jj8\Documents\test\vcf\pfeiffer\results\
Directory: C:\Users\jj8\Documents\test\vcf\pfeiffer\results
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 09/02/2015 12:02 338184 pfeiffer.genes.tsv
-a--- 09/02/2015 12:02 24857818 pfeiffer.html
-a--- 09/02/2015 12:02 4634501 pfeiffer.variants.tsv
-a--- 09/02/2015 12:02 11868447 pfeiffer.vcf
which is what was expected. Did you do something different?
The difference is that the --out-file
I provided had a .vcf
suffix. I have to provide some suffix here, because otherwise Exomiser deletes whatever is after the last period and replaces it with .vcf
(e.g. --out-file my.output.prefix
results in my.output.vcf
. If I provide --out-file path/to/out/vcf/file.vcf
, it tries to generate path/to/out/genes.tsv/file.genes.tsv
.
One potential solution that might clarify things would be to change --out-file
to --out-prefix
and not have any suffix-parsing/overwriting. In the meantime, I need to switch to specifying --out-file out.file.dummysuffix
.
Indeed, in the interim you could always use hyphens instead of dots within the file name and use the dot to distinguish the file extension like a sane person would.
If you like I could look at implementing what they did for InterProScan5. Apparently no one ever complained about the file options for this.
These are the relevant options they have:
-b,--output-file-base <OUTPUT-FILE-BASE> Optional, base output filename
(relative or absolute path).
Note that this option, the
--output-dir (-d) option and
the --outfile (-o) option are
mutually exclusive. The
appropriate file extension for
the output format(s) will be
appended automatically. By
default the input file
path/name will be used.
-d,--output-dir <OUTPUT-DIR> Optional, output directory.
Note that this option, the
--outfile (-o) option and the
--output-file-base (-b) option
are mutually exclusive. The
output filename(s) are the
same as the input filename,
with the appropriate file
extension(s) for the output
format(s) appended
automatically .
-f,--formats <OUTPUT-FORMATS> Optional, case-insensitive,
comma separated list of output
formats. Supported formats are
TSV, XML, GFF3, HTML and SVG.
Default for protein sequences
are TSV, XML and GFF3, or for
nucleotide sequences GFF3 and
XML.
-o,--outfile <EXPLICIT_OUTPUT_FILENAME> Optional explicit output file
name (relative or absolute
path). Note that this option,
the --output-dir (-d) option
and the --output-file-base
(-b) option are mutually
exclusive. If this option is
given, you MUST specify a
single output format using the
-f option. The output file
name will not be modified.
Note that specifying an output
file name using this option
OVERWRITES ANY EXISTING FILE.
It would still be helpful if you can give an example of the input settings you provided and the output you expected as I can add this to some tests to ensure the application does as expected.
Haha touché. That said, the VCF files I get are often named things like 2013.08.23.11.07.16_GenomeSub_mcgill_vcf_316_HCSl_Marshfield06.flt.vcf
, and, as per custom, I would usually try to add another suffix to the end with each processing step (e.g. file.flt.annotated.subset.vcf
).
After much consternation, I settled on a more sensible output filename, and ran the following:
java -Xms2G -Xmx5G -jar /filer/tools/exomiser/exomiser-cli-6.0.0/exomiser-cli-6.0.0.jar \
--min-qual 30 --max-freq 1.0 --out-format TAB-VARIANT \
--prioritiser hiphive --keep-off-target false --keep-non-pathogenic false \
--hpo-ids HP:0000047,HP:0000154,HP:0000219,HP:0000322,HP:0000325,HP:0000369,HP:0000445,HP:0000446,HP:0002474,HP:0002608,HP:0003189,HP:0005274,HP:0006889,HP:0009765 \
--vcf /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf \
--out-file /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf
(Technically, I ran it with --out-format TAB-GENE,TAB-VARIANT,VCF
, but we'll ignore that for now)
I was hoping it would generate the specified output file: /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf
Unfortunately, instead it tries to create: /dupa-filer/buske/phenomecentral/geno/variant.tsv/F0000009/F0000009.variant.tsv
IPS5's solution seems a bit like overkill to me. I'd still suggest the long-term solution be an --out-prefix PREFIX
, where you then create PREFIX.vcf
, PREFIX.variant.tsv
, or any other suffixes that are specified by the out-formats. I dislike --out-file
because at the end of the day, it doesn't really set the name of the output file, it's a suggestion that is only the actual output file if the suffix matches and only that out-format is specified. :)
OK so let's formalise this and I'll close this today. Basically whatever the input name or output prefixes are exomiser will simply append the specified output format. The exception being when no out-prefix is specified in which case exomiser-results
is appended between the input filename and the output format file extension.
Given the exomiser settings with a specified out-prefix
--vcf /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf
--out-prefix /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf
--out-format TAB-GENE,TAB-VARIANT,VCF,HTML
When exomiser writes out the results files Then they will be named:
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.vcf
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.genes.tsv
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.variants.tsv
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.html
Given the exomiser settings with specified out-prefix
--vcf /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf
--out-format TAB-GENE,TAB-VARIANT,VCF,HTML
When exomiser writes out the results files Then they will be named:
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf-exomiser-results.vcf
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf-exomiser-results.genes.tsv
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf-exomiser-results.variants.tsv
/dupa-filer/buske/phenomecentral/geno/vcf/F0000009/2012.07.05.09.38.07_GenomeSub_mcgill_vcf_KB_174_81272.vcf-exomiser-results.html
@julesjacobsen This is great. Thanks, Jules!
It appears that anywhere the string
vcf
appears in the path of the output file, it is substituted withgenes.tsv
, even if it doesn't appear at the end (note that thevcf
directory is changed to agenes.tsv
directory which doesn't exist):