CPTR-ReSeqTB / UVP

Mycobacterium tuberculosis next generation sequence analysis
MIT License
21 stars 12 forks source link

Cannot generate GATK_Resilist.grp #3

Closed hyunlee11037 closed 5 years ago

hyunlee11037 commented 6 years ago

Hi, so we're experiencing a problem trying to run the "BaseRecalibrator" step in your pipeline.

self.__CallCommand('BaseRecalibrator', ['java', '-Xmx4g', '-jar', self.__gatk, '-T', 'BaseRecalibrator',
     '-I', GATKdir +'/GATK_sdrc.bam', '-R', self.reference, '--knownSites',
     self.snplist, '-o', GATKdir +'/GATK_Resilist.grp','-nct', '8'])

We tried typing the equivalent command in the terminal, and we get the following:

INFO  10:52:11,406 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  10:52:11,408 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41 
INFO  10:52:11,408 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  10:52:11,408 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  10:52:11,411 HelpFormatter - Program Args: -T BaseRecalibrator -I Results/Test/tmp/GATK/GATK_sdrc.bam -R Results/Test/tmp/bwa/index/ref.fa --knownSites /uvp/bin/snps.vcf -o Results/Test/tmp/GATK/GATK_Resilist.grp -nct 8 
INFO  10:52:11,413 HelpFormatter - Executing as hwl9@cs-compbio-22 on Linux 4.4.0-127-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11. 
INFO  10:52:11,413 HelpFormatter - Date/Time: 2018/06/19 10:52:11 
INFO  10:52:11,413 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  10:52:11,413 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  10:52:12,302 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.4-0-g7e26428): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Couldn't read file /uvp/bin/snps.vcf because file '/uvp/bin/snps.vcf' does not exist
##### ERROR ------------------------------------------------------------------------------------------

I believe the issue is with self.snplist, because in the configuration file, it mentions snplist: /uvp/bin/snps.vcf. But snps.vcf is nowhere to be found in the bin directory! How could we go about generating/finding this file? Thank you

mezewudo commented 6 years ago

You can use an M. tuberculosis VCF file for that purpose.

On Tue, Jun 19, 2018 at 2:24 PM Brian Lee notifications@github.com wrote:

Hi, so we're experiencing a problem trying to run the "BaseRecalibrator" step in your pipeline.

self.CallCommand('BaseRecalibrator', ['java', '-Xmx4g', '-jar', self.gatk, '-T', 'BaseRecalibrator', '-I', GATKdir +'/GATK_sdrc.bam', '-R', self.reference, '--knownSites', self.snplist, '-o', GATKdir +'/GATK_Resilist.grp','-nct', '8'])

We tried typing the equivalent command in the terminal, and we get the following:

INFO 10:52:11,406 HelpFormatter - -------------------------------------------------------------------------------- INFO 10:52:11,408 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41 INFO 10:52:11,408 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 10:52:11,408 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 10:52:11,411 HelpFormatter - Program Args: -T BaseRecalibrator -I Results/Test/tmp/GATK/GATK_sdrc.bam -R Results/Test/tmp/bwa/index/ref.fa --knownSites /uvp/bin/snps.vcf -o Results/Test/tmp/GATK/GATK_Resilist.grp -nct 8 INFO 10:52:11,413 HelpFormatter - Executing as hwl9@cs-compbio-22 on Linux 4.4.0-127-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11. INFO 10:52:11,413 HelpFormatter - Date/Time: 2018/06/19 10:52:11 INFO 10:52:11,413 HelpFormatter - -------------------------------------------------------------------------------- INFO 10:52:11,413 HelpFormatter - -------------------------------------------------------------------------------- INFO 10:52:12,302 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Couldn't read file /uvp/bin/snps.vcf because file '/uvp/bin/snps.vcf' does not exist
ERROR ------------------------------------------------------------------------------------------

I believe the issue is with self.snplist, because in the configuration file, it mentions snplist: /uvp/bin/snps.vcf. But snps.vcf is nowhere to be found in the bin directory! How could we go about generating/finding this file? Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CPTR-ReSeqTB/UVP/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLYb2_mq7fjzFnySiSZGBS416Eev8tEks5t-UHhgaJpZM4UuAoK .

hyunlee11037 commented 6 years ago

We're using TB data from the PATRIC database ( example link: ftp://ftp.patricbrc.org/genomes_by_species/Mycobacterium_tuberculosis/1010834.3 )

I found an H37Rv reference genome from NCBI, but PATRIC does not appear to provide VCF files for their WGS data. Could you please provide some help on how we can find/create a MTB VCF file?

mezewudo commented 6 years ago

Sure, you can use a variant caller like a Samtools or GATK to create VCFs from your sample files. You will need to follow the manuals for either of those tools. Once you have the VCF, you can then use as the snplist.

Else we will find a way to get you a sample VCF file to get through the step.

On Tue, Jun 19, 2018 at 6:37 PM Brian Lee notifications@github.com wrote:

We're using TB data from the PATRIC database ( example link: ftp://ftp.patricbrc.org/genomes_by_species/Mycobacterium_tuberculosis/1010834.3 )

I found an H37Rv reference genome from NCBI, but PATRIC does not appear to provide VCF files for their WGS data. Could you please provide some help on how we can find/create a MTB VCF file?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/CPTR-ReSeqTB/UVP/issues/3#issuecomment-398568080, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLYb_VlxMw2ARcueapaLrzFTxGk0-SQks5t-X0TgaJpZM4UuAoK .

alantsangmb commented 5 years ago

Does it mean we have to call the SNPs for each sample and generate the corresponding snp.vcf before running UVP? So it would run bwa twice for each sample, one before UVP and one during the UVP pipeline. @mezewudo

dfornika commented 5 years ago

@brianlee99 Were you able to build a snps.vcf file for this analysis?