Genomicus / bedtools

Automatically exported from code.google.com/p/bedtools
0 stars 0 forks source link

retain _1 & _2 strand information in bedpe #152

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. bedtools bamtobed -bedpe convert bam to bedpe
2. bedpe always put end with smaller start1 on col 1-3 and larger start on col 
4-6
3. _1 and _2 might be on col 1-3 and 4-6, or the other way around
4. thus the last two cols of bedpe are always the same in a bedpe of one library
5. thus the strand information is lost, if user doesn't reverse the strand of 
_2 (if dUTP) or _1 (if ligation) in bam/sam
6*. (more detail) I have this concern because of the multiple mappers. 
When I sort bam file with secondary alignments by name (samtools sort -n), 
samtools tend to put all the alignments of _1 together, then all the alignments 
of _2. The lastest version of bedtools bamtobed (or any other tools that 
produce bedpe as intermediates) only check the names of two lines when it 
produces bedpe (correct me if I am wrong). Obviously the two adjacent bams will 
have the same name (they are the same sequence). Then bedpe will have lines 
with two _1 mapping, then lines with two _2 mapping. I tried to bypass this 
issue by using the direct output of the aligner (STAR here), which put paired 
alignments together. But sometimes it produces _2 first then _1, depending on 
the orientation. Since bedtools bamtobed does not retain the _1/_2 information 
but put whichever with smaller start first, the strand information is lost. 

>What is the expected output? What do you see instead?
I personally would like bedpe to retained the _1 _2 information. 
If the author does not agree, then please let me know if my following 
modification is safe or not :-)

>What version of the product are you using? On what operating system?
2.17.0 and Mac 10.8.3. But I think this is OS independent, according to the 
code in bamToBed.cpp:
///////
    if ( chrom1 > chrom2 || ((chrom1 == chrom2) && (start1 > start2)) ) {
        swap(chrom1, chrom2);
        swap(start1, start2);
        swap(end1, end2);
        swap(strand1, strand2);
    }
///////
>Please provide any additional information below.
Please let me know if this modification is safe: (sorry to change the source 
code, but I need a solution badly!)
1. delete the swap lines (above) in PrintBedPE in bamToBed.cpp
2. change ConvertBamToBedpe in bamToBed.cpp line 319 from
////////////
else if (bam1.IsPaired() && bam2.IsPaired()) {
            PrintBedPE(bam1, bam2, refs, useEditDistance);
        }
////////////
to 
////////////
        else if (bam1.IsPaired() && bam2.IsPaired()) {
                /**! begin change**/
                if (bam1.IsFirstMate() == true && bam2.IsFirstMate() == false)
                        PrintBedPE (bam1, bam2, refs, useEditDistance);
                else if (bam1.IsFirstMate() == false && bam2.IsFirstMate() == true)
                        PrintBedPE (bam2, bam1, refs, useEditDistance);
                else
                cerr << "*****WARNING: Query " << bam1.Name
                     << "Missorted Bam Error occurs modified bamToBed" << endl;
            /**! end of change**/
        }
////////////

According to limited testes, this works and doesn't throw any error. But I am 
not sure whether this is gonna influence other functions. I checked pairtobed,  
it seems to have a protection already in 
BedIntersectPE::FindSpanningOverlaps:
////////////
    if ((type == "ispan") || (type == "notispan")) {
        spanStart = a.end1;
        spanEnd = a.start2;
        if (a.end1 > a.start2) {
            spanStart = a.end2;
            spanEnd = a.start1;
        }
    }
    else if ((type == "ospan") || (type == "notospan")) {
        spanStart = a.start1;
        spanEnd = a.end2;
        if (a.start1 > a.start2) {
            spanStart = a.start2;
            spanEnd = a.end1;
        }
////////////
But I am not sure about others. 
Please instruct. 
Thank you very much in advance, 
BH

Original issue reported on code.google.com by boha...@gmail.com on 4 Jun 2013 at 4:15