YuSugihara / QTL-seq

QTL-seq pipeline to identify causative mutations responsible for a phenotype
46 stars 23 forks source link

Lower delta SNP reported as delta SNP=1 #17

Closed Deeptirao closed 3 years ago

Deeptirao commented 3 years ago

QTL-seq package is calling some locations as delta SNP index 1 even if one of the bulks contains the other allele. I have checked the snp.tsv file and found that a much lower depth than available is being used to compute the delta SNP index. The other alleles in this case have base qualities higher than the tool's threshold of MQ> 40 and BQ> 18. Why then are these being left out?

YuSugihara commented 3 years ago

Sorry, I could not capture your point...

If you found strange behavior of QTL-seq, please report it with your options of QTL-seq and the segment including a problematic line.

Deeptirao commented 3 years ago

I ran QTL-seq with the default parameters. QTL-seq package is calling some locations as delta SNP index 1 even if one of the bulks contains the other allele. I have checked the snp.tsv file and found that a much lower depth than available is being used to compute the delta SNP index. The other alleles in this case have base qualities higher than the tool's threshold of MQ> 40 and BQ> 18. DP for high bulk=30 (Mutant allele=29; Wild type allele=1 (QV=37 and MQ=60)) DP for low bulk=22 (Mutant allele=4; Wild type allele=18 (QV=37,25,27,37 and MQ=60 for all 4)) Why then are these being left out? High Parent = 100% A Low Parent = 100% T

Following are the details found in snp index.tsv file: Chr Position Depth low bulk Depth high bulk CI 99 CI 95 Low bulk SNP index High bulk SNP index Delta SNP index

Chr03 29614815 snp 8 8 0.75 0.5 0 1 1

Chr0329614815snp880.750.5011 Please check the screenshots from IGV (attached).

On Wed, Feb 24, 2021 at 7:51 AM Yu Sugihara notifications@github.com wrote:

Sorry, I could not capture your point...

If you found strange behavior of QTL-seq, please report it with your options of QTL-seq and the segment including a problematic line.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-784699333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDZKZYUDDTOYQ2YKG6T6HDTARPBJANCNFSM4XZVWHLA .

-- Regards,

Deepti Graduate Student, CCMB

YuSugihara commented 3 years ago

I could not see your screen shot of IGV on Github issues, but probably I got your point.

QTL-seq/MutMap see the "AD" field in VCF format. (please check VCF file generated by QTL-seq) "AD" represents "high quality alignment depth". If you see the alternative allele at the SNP position having delta SNP index = 1, please check the mapping qualities of those reads on your IGV.

Probably, they are low compared with other reads.

Deeptirao commented 3 years ago

Hi, Thank you very much for your reply. Please note the following:

  1. I checked the vcf file and did not find alternate alleles. Hence, no discrepancy. SNP index for high bulk =1 and low bulk=0
  2. But when I checked the filtered bam file, there is a discrepancy. I am not sure what caused it, because mapping quality and base qualities are above QTL-seq thresholds.
  3. Consider one such position. The question is, if these reads did not get filtered out in bulk1.filt.bam (Depths are A=4; T=18) and bulk2.filt.bam (Depths are A= 26; T=1), and parent.filt.bam (Depths are A= 0; T= 38) why are AD values of bulk1, bulk2 and parent 8, 8 and 11?

The base qualities and mapping qualities are good. I'd like you to see the screenshots. Can you please share your email ID or write to me at deeptirao@csirccmb.org?

On Tue, Mar 9, 2021 at 8:18 PM Yu Sugihara notifications@github.com wrote:

I could not see your screen shot of IGV on Github issues, but probably I got your point.

QTL-seq/MutMap see the "AD" field in VCF format. (please check VCF file generated by QTL-seq) "AD" represents "high quality alignment depth". If you see the alternative allele at the SNP position having delta SNP index = 1, please check the mapping qualities of those reads on your IGV.

Probably, they are low compared with other reads.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-793994775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDZKZ5JIAKY2GB2HFEFA5TTCYYNTANCNFSM4XZVWHLA .

-- Regards,

Deepti Graduate Student, CCMB

YuSugihara commented 3 years ago

Filtered bam file is filtered not by mapoing quality but by the proper paired alignment.

Mappoimg quality is applied when vcf file is generated. Please see the detail mapping quality on IGV.

2021年3月10日(水) 21:18 Deeptirao @.***>:

Hi, Thank you very much for your reply. Please note the following:

  1. I checked the vcf file and did not find alternate alleles. Hence, no discrepancy. SNP index for high bulk =1 and low bulk=0
  2. But when I checked the filtered bam file, there is a discrepancy. I am not sure what caused it, because mapping quality and base qualities are above QTL-seq thresholds.
  3. Consider one such position. The question is, if these reads did not get filtered out in bulk1.filt.bam (Depths are A=4; T=18) and bulk2.filt.bam (Depths are A= 26; T=1), and parent.filt.bam (Depths are A= 0; T= 38) why are AD values of bulk1, bulk2 and parent 8, 8 and 11?

The base qualities and mapping qualities are good. I'd like you to see the screenshots. Can you please share your email ID or write to me at @.***?

On Tue, Mar 9, 2021 at 8:18 PM Yu Sugihara @.***> wrote:

I could not see your screen shot of IGV on Github issues, but probably I got your point.

QTL-seq/MutMap see the "AD" field in VCF format. (please check VCF file generated by QTL-seq) "AD" represents "high quality alignment depth". If you see the alternative allele at the SNP position having delta SNP index = 1, please check the mapping qualities of those reads on your IGV.

Probably, they are low compared with other reads.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-793994775 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AIDZKZ5JIAKY2GB2HFEFA5TTCYYNTANCNFSM4XZVWHLA

.

-- Regards,

Deepti Graduate Student, CCMB

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-795334675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIH5WMWPR4YZTDLEMXDA76DTC5PRFANCNFSM4XZVWHLA .

Deeptirao commented 3 years ago

Mapping quality is 60 for all these.

On Wed, Mar 10, 2021, 6:00 PM Yu Sugihara @.***> wrote:

Filtered bam file is filtered not by mapoing quality but by the proper paired alignment.

Mappoimg quality is applied when vcf file is generated. Please see the detail mapping quality on IGV.

2021年3月10日(水) 21:18 Deeptirao @.***>:

Hi, Thank you very much for your reply. Please note the following:

  1. I checked the vcf file and did not find alternate alleles. Hence, no discrepancy. SNP index for high bulk =1 and low bulk=0
  2. But when I checked the filtered bam file, there is a discrepancy. I am not sure what caused it, because mapping quality and base qualities are above QTL-seq thresholds.
  3. Consider one such position. The question is, if these reads did not get filtered out in bulk1.filt.bam (Depths are A=4; T=18) and bulk2.filt.bam (Depths are A= 26; T=1), and parent.filt.bam (Depths are A= 0; T= 38) why are AD values of bulk1, bulk2 and parent 8, 8 and 11?

The base qualities and mapping qualities are good. I'd like you to see the screenshots. Can you please share your email ID or write to me at @.***?

On Tue, Mar 9, 2021 at 8:18 PM Yu Sugihara @.***> wrote:

I could not see your screen shot of IGV on Github issues, but probably I got your point.

QTL-seq/MutMap see the "AD" field in VCF format. (please check VCF file generated by QTL-seq) "AD" represents "high quality alignment depth". If you see the alternative allele at the SNP position having delta SNP index = 1, please check the mapping qualities of those reads on your IGV.

Probably, they are low compared with other reads.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-793994775 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AIDZKZ5JIAKY2GB2HFEFA5TTCYYNTANCNFSM4XZVWHLA

.

-- Regards,

Deepti Graduate Student, CCMB

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-795334675 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AIH5WMWPR4YZTDLEMXDA76DTC5PRFANCNFSM4XZVWHLA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YuSugihara/QTL-seq/issues/17#issuecomment-795346982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDZKZ5SBRSVVF7USSWU7NLTC5Q53ANCNFSM4XZVWHLA .

YuSugihara commented 3 years ago

Please check the flags of the problematic reads.

https://ppotato.wordpress.com/2010/08/25/samtool-bitwise-flag-paired-reads/ https://broadinstitute.github.io/picard/explain-flags.html https://www.slideshare.net/lindenb/ngsformats?ref=http://plindenbaum.blogspot.com/2013/09/presentation-file-formats-for-next.html?m=1

These websites helps you to understand the flags of SAM file.