arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

IntersectBed between a bam and a bed file give different results using or not using -sorted #135

Open qqwang-berkeley opened 6 years ago

qqwang-berkeley commented 6 years ago

I have a sorted bam and a bed file and tried to find reads in the bam file that overlap with entries in the bed file. I tried two ways:

1) intersectBed -abam sorted.bam -b sample.bed -f 1 > intersect.bam

2) First sort the bed file using the chromosome order in the bam file header: bedtools sort -faidx chromo_order_in_sam_header.txt -i sample.bed > sample_sorted.bam

Now intersectBed with the sorted mode (-g genome file provided fetching using fetchChromSizes on UCSC browser):

intersectBed -abam sorted.bam -b sample_sorted.bed -sorted -g genome_file.txt -f 1 > intersect_sorted.bam

These two ways gave different results. A check on genome browser shows that 1) is giving the right answer while 2) is missing a lot of reads. Have you observed this before and do you have any recommendations to make the results match?

Qingqing

arq5x commented 6 years ago

Could you please check if this result persists with version 2.27.1 on the bedtools2 repository: https://github.com/arq5x/bedtools2/releases If the issue persists could you post it in that repo (bedtools2, not bedtools)?