Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162
stars
89
forks
source link
Running pindel individually per sample (1500 samples high coverage) #67
I would like to identify SVs using 1.5k whole genome samples using Pindel. It was impossible to run all them at once, I even tried with individual chromosomes but did not work.
After getting all output files per sample, I merged ouput files into one file and based on start, stop, chrID, Svtype and LengthOfSV: I have merged all SVs with supporting samples.
I merge them into :
chrId1 start1 stop1 lenght1 {sample x} {sample y} (merged)
chrId2 start2 stop2 lenght2 {sample x}
So far so good, it kinda works. But what I have figured it out is: there are several SVs I suppose they should be the same but because there are a few nucleotides difference (for example: start, stop, length of SV) I got them as if they are different structural variations.
Is there anyone who can give me some advices what should I do? Does the way that I do whether makes sense or not?
Hi everyone,
I would like to identify SVs using 1.5k whole genome samples using Pindel. It was impossible to run all them at once, I even tried with individual chromosomes but did not work.
After getting all output files per sample, I merged ouput files into one file and based on start, stop, chrID, Svtype and LengthOfSV: I have merged all SVs with supporting samples.
For examp : chrId1 start1 stop1 lenght1 {sample x} chrId2 start2 stop2 lenght2 {sample x} chrId1 start1 stop1 lenght1 {sample y}
I merge them into : chrId1 start1 stop1 lenght1 {sample x} {sample y} (
merged
) chrId2 start2 stop2 lenght2 {sample x}So far so good, it kinda works. But what I have figured it out is: there are several SVs I suppose they should be the same but because there are a few nucleotides difference (for example: start, stop, length of SV) I got them as if they are different structural variations.
Is there anyone who can give me some advices what should I do? Does the way that I do whether makes sense or not?
Best, Mehmet