What steps will reproduce the problem?
1. bedtools bamtobed -bedpe convert bam to bedpe
2. bedpe always put end with smaller start1 on col 1-3 and larger start on col
4-6
3. _1 and _2 might be on col 1-3 and 4-6, or the other way around
4. thus the last two cols of bedpe are always the same in a bedpe of one library
5. thus the strand information is lost, if user doesn't reverse the strand of
_2 (if dUTP) or _1 (if ligation) in bam/sam
6*. (more detail) I have this concern because of the multiple mappers.
When I sort bam file with secondary alignments by name (samtools sort -n),
samtools tend to put all the alignments of _1 together, then all the alignments
of _2. The lastest version of bedtools bamtobed (or any other tools that
produce bedpe as intermediates) only check the names of two lines when it
produces bedpe (correct me if I am wrong). Obviously the two adjacent bams will
have the same name (they are the same sequence). Then bedpe will have lines
with two _1 mapping, then lines with two _2 mapping. I tried to bypass this
issue by using the direct output of the aligner (STAR here), which put paired
alignments together. But sometimes it produces _2 first then _1, depending on
the orientation. Since bedtools bamtobed does not retain the _1/_2 information
but put whichever with smaller start first, the strand information is lost.
>What is the expected output? What do you see instead?
I personally would like bedpe to retained the _1 _2 information.
If the author does not agree, then please let me know if my following
modification is safe or not :-)
>What version of the product are you using? On what operating system?
2.17.0 and Mac 10.8.3. But I think this is OS independent, according to the
code in bamToBed.cpp:
///////
if ( chrom1 > chrom2 || ((chrom1 == chrom2) && (start1 > start2)) ) {
swap(chrom1, chrom2);
swap(start1, start2);
swap(end1, end2);
swap(strand1, strand2);
}
///////
>Please provide any additional information below.
Please let me know if this modification is safe: (sorry to change the source
code, but I need a solution badly!)
1. delete the swap lines (above) in PrintBedPE in bamToBed.cpp
2. change ConvertBamToBedpe in bamToBed.cpp line 319 from
////////////
else if (bam1.IsPaired() && bam2.IsPaired()) {
PrintBedPE(bam1, bam2, refs, useEditDistance);
}
////////////
to
////////////
else if (bam1.IsPaired() && bam2.IsPaired()) {
/**! begin change**/
if (bam1.IsFirstMate() == true && bam2.IsFirstMate() == false)
PrintBedPE (bam1, bam2, refs, useEditDistance);
else if (bam1.IsFirstMate() == false && bam2.IsFirstMate() == true)
PrintBedPE (bam2, bam1, refs, useEditDistance);
else
cerr << "*****WARNING: Query " << bam1.Name
<< "Missorted Bam Error occurs modified bamToBed" << endl;
/**! end of change**/
}
////////////
According to limited testes, this works and doesn't throw any error. But I am
not sure whether this is gonna influence other functions. I checked pairtobed,
it seems to have a protection already in
BedIntersectPE::FindSpanningOverlaps:
////////////
if ((type == "ispan") || (type == "notispan")) {
spanStart = a.end1;
spanEnd = a.start2;
if (a.end1 > a.start2) {
spanStart = a.end2;
spanEnd = a.start1;
}
}
else if ((type == "ospan") || (type == "notospan")) {
spanStart = a.start1;
spanEnd = a.end2;
if (a.start1 > a.start2) {
spanStart = a.start2;
spanEnd = a.end1;
}
////////////
But I am not sure about others.
Please instruct.
Thank you very much in advance,
BH
Original issue reported on code.google.com by boha...@gmail.com on 4 Jun 2013 at 4:15
Original issue reported on code.google.com by
boha...@gmail.com
on 4 Jun 2013 at 4:15