amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

Error - No overlapping contigs found during variant calling pre processing #115

Closed ramcn closed 6 years ago

ramcn commented 6 years ago

Hello,

I ran the SNAP tool on SRR622461 dataset and trying to use the generated SAM file for variant calling using GATK. But seeing an "no overlapping contigs found error" during RealignerTargetCreator GATK command.

The attached script sbatch-script has the detailed pipeline command and sbatch-log has the error seen during the RealignerTargetCreator GATK command. Please let me know if you any clue about how to fix this error.

sbatch-log.txt sbatch-script.txt

ramcn commented 6 years ago

PS: the workflow steps in script works fine when I use aligners like bwa mem and bowtie.

bolosky commented 6 years ago

That error means that the contig names that SNAP is generating don’t correspond to the ones that GATK expects.

SNAP gets its contig names from its index, so they’re determined at index build time. There are a few options to deal with thing like spaces in the contig names in the FASTA reference used during index build that you might need to use. (For example, the -b or -bSpace flags.)

I’d suggest looking at the header in the SAM file generated by SNAP to see what they look like (look for the @SQ lines) and compare them to the ones in the GRCH 38 reference, and you’ll probably see what’s happening.

If that doesn’t make sense to you, send me some examples of @SQ lines and I’ll figure out what’s up.

--Bill

From: ramcn notifications@github.com Sent: Thursday, May 24, 2018 9:43 AM To: amplab/snap snap@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [amplab/snap] Error - No overlapping contigs found during variant calling pre processing (#115)

Hello,

I ran the SNAP tool on SRR622461 dataset and trying to use the generated SAM file for variant calling using GATK. But seeing an "no overlapping contigs found error" during RealignerTargetCreator GATK command.

The attached script sbatch-script has the detailed pipeline command and sbatch-log has the error seen during the RealignerTargetCreator GATK command. Please let me know if you any clue about how to fix this error.

sbatch-log.txthttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Ffiles%2F2036273%2Fsbatch-log.txt&data=02%7C01%7Cbolosky%40microsoft.com%7Cdef88f73c6a04a7c422508d5c1956e88%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636627769902435435&sdata=mUj3gyWuUUjih0%2FBplyIEQ%2BEdGTiMSaz5pAjUruvpq8%3D&reserved=0 sbatch-script.txthttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Ffiles%2F2036274%2Fsbatch-script.txt&data=02%7C01%7Cbolosky%40microsoft.com%7Cdef88f73c6a04a7c422508d5c1956e88%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636627769902435435&sdata=ueFhQAHp43htupG%2B5sXVkkZwP%2FYtVr2pI2yvHjHnXXo%3D&reserved=0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F115&data=02%7C01%7Cbolosky%40microsoft.com%7Cdef88f73c6a04a7c422508d5c1956e88%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636627769902445444&sdata=rA8zrP0z9ad7oWZdN%2BbWZ1fB3Pxp1Yinjzm9l1nK0kU%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752VquNkX8QUyC_yYCis03xIut9Ob0ks5t1uMcgaJpZM4UMmwu&data=02%7C01%7Cbolosky%40microsoft.com%7Cdef88f73c6a04a7c422508d5c1956e88%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636627769902455452&sdata=016ms9atsfTSMhfaef%2B1gYAwHEFEW93KBcbOQgrBOuk%3D&reserved=0.

ramcn commented 6 years ago

OK. I am attaching the snap sam file header. snap-header.txt

Please let me know if you can figure something out. I am using human_g1k_v37.fasta reference file.

Thank you, Ram

bolosky commented 6 years ago

Your contig names are things like this:

1_dna:chromosome_chromosome:GRCh37:1:1:249250621:1

When they should just be “1”. To fix this, you’ll need to rebuild SNAP’s index and add in a couple of options: -bSpace -B_

(the second one is dash B underscore). That tells SNAP that when it sees a space or an underscore that it’s at the end of the contig name and it should just use what it’s seen.

Once you’ve rebuilt the index, rerun SNAP and the error should go away.

--Bill

From: ramcn notifications@github.com Sent: Friday, May 25, 2018 11:59 AM To: amplab/snap snap@noreply.github.com Cc: Bill Bolosky bolosky@microsoft.com; Comment comment@noreply.github.com Subject: Re: [amplab/snap] Error - No overlapping contigs found during variant calling pre processing (#115)

OK. I am attaching the snap sam file header. snap-header.txthttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Ffiles%2F2040580%2Fsnap-header.txt&data=02%7C01%7Cbolosky%40microsoft.com%7C5cdcee422fb3436a491a08d5c2718e80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636628715334229705&sdata=imX4J7VKn3O0NB034dJuc9Bd5sgfwX8Q62yyWgurgkc%3D&reserved=0

Please let me know if you can figure something out. I am using human_g1k_v37.fasta reference file.

Thank you, Ram

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F115%23issuecomment-392150749&data=02%7C01%7Cbolosky%40microsoft.com%7C5cdcee422fb3436a491a08d5c2718e80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636628715334239718&sdata=VP9APFAEkN9jJ8YbhNLwOFAT2wlE%2F4iGi2DRHAqsW7M%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752cGP82nIOpKsLNGSRM1JZ0P1qky_ks5t2FRrgaJpZM4UMmwu&data=02%7C01%7Cbolosky%40microsoft.com%7C5cdcee422fb3436a491a08d5c2718e80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636628715334239718&sdata=4KHsDHipPFv%2BXuC8cbFzbYIy0MMwEsLLfMZ05qtUfiU%3D&reserved=0.

ramcn commented 6 years ago

Hi Bill

Reindexing with these options worked and the error is gone.

Thank you, Ram