Running pindel individually per sample (1500 samples high coverage)

Hi everyone,

I would like to identify SVs using 1.5k whole genome samples using Pindel. It was impossible to run all them at once, I even tried with individual chromosomes but did not work.

After getting all output files per sample, I merged ouput files into one file and based on start, stop, chrID, Svtype and LengthOfSV: I have merged all SVs with supporting samples.

For examp : chrId1 start1 stop1 lenght1 {sample x} chrId2 start2 stop2 lenght2 {sample x} chrId1 start1 stop1 lenght1 {sample y}

I merge them into : chrId1 start1 stop1 lenght1 {sample x} {sample y} (merged) chrId2 start2 stop2 lenght2 {sample x}

So far so good, it kinda works. But what I have figured it out is: there are several SVs I suppose they should be the same but because there are a few nucleotides difference (for example: start, stop, length of SV) I got them as if they are different structural variations.

Is there anyone who can give me some advices what should I do? Does the way that I do whether makes sense or not?

Best, Mehmet

genome / pindel

Running pindel individually per sample (1500 samples high coverage) #67