arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

Issue with overlap length in intersect -split? #101

Open ghost opened 10 years ago

ghost commented 10 years ago

Hi-

I am attempting to intersect some transcripts in BED12 format with gff regions. The problem I'm having is that the reported overlap is longer than the query sequence. Here's a minimal example:

Query (test.bed): chr1 1139223 1141951 CCDS10.1 0 - 1139223 1141951 0 5 125,203,88,123,187, 0,190,555,1526,2541,

Reference (test.gff): chr1 . 0 1140106 1141472 . . . .

Command used: bedtools intersect -b test.bed -a test.gff -wo -split

Output: chr1 . 0 1140106 1141472 . . . . chr1 1139223 1141951 CCDS10.1 0 - 1139223 1141951 0 5 125,203,88,123,187, 0,190,555,1526,2541, 1367

It looks like the reported overlap of 1367 is substantially longer than the transcript length of 726. Am I doing something silly?

Thanks for your help!

Best,

-Cyrus