genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162 stars 89 forks source link

Running pindel individually per sample (1500 samples high coverage) #67

Open MehmetGoktay opened 7 years ago

MehmetGoktay commented 7 years ago

Hi everyone,

I would like to identify SVs using 1.5k whole genome samples using Pindel. It was impossible to run all them at once, I even tried with individual chromosomes but did not work.

After getting all output files per sample, I merged ouput files into one file and based on start, stop, chrID, Svtype and LengthOfSV: I have merged all SVs with supporting samples.

For examp : chrId1 start1 stop1 lenght1 {sample x} chrId2 start2 stop2 lenght2 {sample x} chrId1 start1 stop1 lenght1 {sample y}

I merge them into : chrId1 start1 stop1 lenght1 {sample x} {sample y} (merged) chrId2 start2 stop2 lenght2 {sample x}

So far so good, it kinda works. But what I have figured it out is: there are several SVs I suppose they should be the same but because there are a few nucleotides difference (for example: start, stop, length of SV) I got them as if they are different structural variations.

Is there anyone who can give me some advices what should I do? Does the way that I do whether makes sense or not?

Best, Mehmet

wjaratlerdsiri commented 7 years ago

why not pindel2vcf script? I think handle vcd files easier than pindel files. Try "bcftools merge" to merge files, but not sure it will work.

James