Closed rolfschr closed 9 years ago
Sorry, this was my bad. BED format starts counting at '0', IGV starts counting at base '1' for the first base of a chromosome. Hence, when comparing the numbers between bedtools coverage output & IGV, one would expect the bases in IGV to be "right-shifted by one".
Hi, I was running bedtools coverage on a bam file. See command line I have trouble determining what the 4th column (the incremental number) in the outcome means. My understanding is that the bed format is right-open, i.e. in the interval chr1:112524191-112525399, the base 112424191 is included but 112424399 is excluded. However, the 4th column starts with '1'. So this would mean that in order to get the exact position for the subsequent features (lines), I would calculate start_of_interval + val_col_4 - 1. For example, in the coverage.bed from above, the line
chr1 112524191 112525399 4 0
means that the base 112524191 + 4 - 1 = 112524194 is covered by zero reads. Again, I assume I have to decrement by 1 because the first line
chr1 112524191 112525399 1 0
(which according to my bed format undertanding deals with the base 112524191) mentions a 1 in column 4.
Now, when checking the bam file in IGV at position 112524811:
I can see exactly one read. Looking at the command line output, this position should correspond to line
chr1 112524191 112525399 627 1
from the coverage.bed (see above). But, 112524191 + 627 - 1 = 112524817. Not 112524818 as I would deduce from IGV.
I hope the above is understandable. I feel there is as discrepency between the numbers I can see in IGV, the nubmers I get from bedtools coverage and what I expect. Maybe I just get something wrong here. To make a long story short: Why is the 4th column of the first row from a given interval annotated as 1 (and not 0) in the bedtools coverage output; given that in the bed format the start position is included?