genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162 stars 90 forks source link

How about different libraries in the same sample #27

Open billzt opened 8 years ago

billzt commented 8 years ago

Pindel requires a configure file as input:

file.bam insert-size sample-name

Well, what should I do if my sample has multiple libraries of different insert-sizes but in a single bam file(produced by samtools merge)?

EWLameijer commented 8 years ago

Hmmm... Kai may be able to correct me on this, but below follows my understanding.

The optimal solution would usually be to make separate bam files for the separate libraries, and create a config file like this: library1.bam 500 sample1 library2.bam 200 sample1 library3.bam 400 sample2

(as you can see, different bam files can share the same sample)

Perhaps it's even possible to split a bam file in libraries, though I myself haven't done so yet.

At the moment, I think the easiest for you would be to take the smallest insert size; Pindel may run a (little) bit slower than if each library comes with the correct insert size, and there could possibly be a few more false positives (especially in repetitive regions), but I guess taking the smallest insert size would probably be the easiest solution.

I hope this helps!

Eric-Wubbo