genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162 stars 90 forks source link

The format of pindel output file #54

Open zykong8 opened 8 years ago

zykong8 commented 8 years ago

Dear Kai and other Pindel authors

In the variant reported, what is the meaning of 'NGS160714_01 2 2 10 10 5 5' ? It lies in the last part of the header line, for example `#################################################################################################### 18 D 1 NT 0 "" ChrID NC_000001.11 BP 2437459 2437461 BP_range 2437459 2437463 Supports 15 15 + 10 10 - 5 5 S1 66 SUM_MS 900 1 NumSupSamples 1 1 NGS160714_01 2 2 10 10 5 5 CTAGGGTTGCTCCATGCAGTGCCCAGCTCCTACTCCTGTCCAAGACTGACTTAGACCTCCTCTGGCCAGCTGGACAGCTCTGCCCAAATCTCAAATTCATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGCaAATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATGTCCCACTCTTTGCTCCACCCCTGCAGCCTCCACCCCTTCAGAACCACCCTCAAGTCA ATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGC AATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATGTCCCA + 2437258 60 NGS160714_01 @M04057:26:000000000-ARA6H:1:2113:26461:10237/1 AATTCATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGC AATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATG + 2437378 60 NGS160714_01 @M04057:26:000000000-ARA6H:1:2110:2867:17132/1 .....'

Thanks!

liangkaiye commented 8 years ago

read counts. you shall convert to vcf and then it is clear.

获取 Outlook for Androidhttps://aka.ms/ghei36

On Sun, Nov 13, 2016 at 12:21 PM +0800, "XiaofeiiaofeiIDO" notifications@github.com<mailto:notifications@github.com> wrote:

Dear Kai and other Pindel authors

In the variant reported, what is the meaning of 'NGS160714_01 2 2 10 10 5 5' ? It lies in the last part of the header line, for example `#################################################################################################### 18 D 1 NT 0 "" ChrID NC_000001.11 BP 2437459 2437461 BP_range 2437459 2437463 Supports 15 15 + 10 10 - 5 5 S1 66 SUM_MS 900 1 NumSupSamples 1 1 NGS160714_01 2 2 10 10 5 5 CTAGGGTTGCTCCATGCAGTGCCCAGCTCCTACTCCTGTCCAAGACTGACTTAGACCTCCTCTGGCCAGCTGGACAGCTCTGCCCAAATCTCAAATTCATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGCaAATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATGTCCCACTCTTTGCTCCACCCCTGCAGCCTCCACCCCTTCAGAACCACCCTCAAGTCA ATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGC AATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATGTCCCA + 2437258 60 NGS160714_01 @M04057:26:000000000-ARA6H:1:2113:26461:10237/1 AATTCATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGC AATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATG + 2437378 60 NGS160714_01 @M04057:26:000000000-ARA6H:1:2110:2867:17132/1 .....'

Thanks!

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/genome/pindel/issues/54, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB9s-43slj9krXhBRqyjbN4IEvIJle6gks5q9pAsgaJpZM4KwmtH.

zykong8 commented 8 years ago

Dear Kai

‘NGS160714_01 2 2 10 10 5 5’ what is the meaning of ' 2 2 ' ? The remaining four numbers represent the total number of supporting reads whose anchors are upstream, the total number of unique supporting reads whose anchors are upstream, the total number of supporting reads whose anchors are downstream, and finally the total number of unique supporting reads whose anchors are downstream.

Thanks!

liangkaiye commented 8 years ago

the numbers 2 2 mean the numbers of reads support the ref allele at the left and right.breakpoints.

获取 Outlook for Androidhttps://aka.ms/ghei36

On Sun, Nov 13, 2016 at 2:59 PM +0800, "XiaofeiiaofeiIDO" notifications@github.com<mailto:notifications@github.com> wrote:

Dear Kai

‘NGS160714_01 2 2 10 10 5 5’ what is the meaning of ' 2 2 ' ? The remaining four numbers represent the total number of supporting reads whose anchors are upstream, the total number of unique supporting reads whose anchors are upstream, the total number of supporting reads whose anchors are downstream, and finally the total number of unique supporting reads whose anchors are downstream.

Thanks!

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/genome/pindel/issues/54#issuecomment-260170397, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB9s-33ZSGjnpRIK1R3ZnOv8S4T6jD7Oks5q9rVZgaJpZM4KwmtH.

woodoo46 commented 7 years ago

Hi Kai,

I have the same question. I have used pindel2vcf to convert the pindel raw output to vcf format, it seems the program interpret the 2 2 as 2, not 4?

Sample1 4 12 10 10 0 0

In generated vcf file, the last two columns show: GT:AD 0/1:12,10

Any idea on this?

Thanks!!!

liangkaiye commented 7 years ago

2 2 are the number of reads across each breakpoint so that we shall not add them together.

Hi Kai,

I have the same question. I have used pindel2vcf to convert the pindel raw output to vcf format, it seems the program interpret the 2 2 as 2, not 4?

Sample1 4 12 10 10 0 0

In generated vcf file, the last two columns show: GT:AD 0/1:12,10

Any idea on this?

Thanks!!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/genome/pindel/issues/54#issuecomment-277863248, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB9s-0HKWrEnfhngiToxeGUF8PLqlKQKks5rZ78FgaJpZM4KwmtH.

woodoo46 commented 7 years ago

In my example, Sample1 4 12 10 10 0 0

In generated vcf file, the last two columns show:

GT:AD 0/1:12,10

The reference alleles have 12 reads?

jmarshall commented 7 years ago

See also the discussion that @EWLameijer added to the FAQ in b706fba61c64a11fb1d3716d501fd2f4d8992e29.

The format description in the gmt.genome.wustl.edu user manual web page has not been updated with these two newer reference-depth fields:

32+) Per sample: the sample name, followed by the total number of supporting reads whose anchors are upstream, the total number of unique supporting reads whose anchors are upstream, the total number of supporting reads whose anchors are downstream, and finally the total number of unique supporting reads whose anchors are downstream.

It would be great if this web page could be updated to describe the current format. Or has Pindel's web presence possibly moved elsewhere?