arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

bedtools coverage -d off-by-one? #108

Closed rolfschr closed 9 years ago

rolfschr commented 9 years ago

Hi, I was running bedtools coverage on a bam file. See command line cmdline I have trouble determining what the 4th column (the incremental number) in the outcome means. My understanding is that the bed format is right-open, i.e. in the interval chr1:112524191-112525399, the base 112424191 is included but 112424399 is excluded. However, the 4th column starts with '1'. So this would mean that in order to get the exact position for the subsequent features (lines), I would calculate start_of_interval + val_col_4 - 1. For example, in the coverage.bed from above, the line

chr1 112524191 112525399 4 0

means that the base 112524191 + 4 - 1 = 112524194 is covered by zero reads. Again, I assume I have to decrement by 1 because the first line

chr1 112524191 112525399 1 0

(which according to my bed format undertanding deals with the base 112524191) mentions a 1 in column 4.

Now, when checking the bam file in IGV at position 112524811:

igv_coverage

I can see exactly one read. Looking at the command line output, this position should correspond to line

chr1 112524191 112525399 627 1

from the coverage.bed (see above). But, 112524191 + 627 - 1 = 112524817. Not 112524818 as I would deduce from IGV.

I hope the above is understandable. I feel there is as discrepency between the numbers I can see in IGV, the nubmers I get from bedtools coverage and what I expect. Maybe I just get something wrong here. To make a long story short: Why is the 4th column of the first row from a given interval annotated as 1 (and not 0) in the bedtools coverage output; given that in the bed format the start position is included?

rolfschr commented 9 years ago

Sorry, this was my bad. BED format starts counting at '0', IGV starts counting at base '1' for the first base of a chromosome. Hence, when comparing the numbers between bedtools coverage output & IGV, one would expect the bases in IGV to be "right-shifted by one".