arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
139 stars 86 forks source link

getfasta coordinates error? #150

Closed jfass closed 5 years ago

jfass commented 5 years ago

It seems to me that there's an off-by-1 error in getfasta. If I have a reference like:

>foo
AACCGGTTAA

... and a GFF file like:

foo fake    exon    6   8   .   +   .   ID=exon1

Then, using getfasta:

bedtools getfasta -fi foo.fa -bed bar.gff3 -fo -
index file foo.fa.fai not found, generating...
>foo:5-8
GTT

(note the left coordinate is 5, but the sequence is actually correct, being bases 6-8). And samtools faidx gives:

samtools faidx foo.fa foo:6-8
>foo:6-8
GTT

So it's just an error in the output fasta header (unless I'm unaware of some coordinate convention??), and not the sequence output.

arq5x commented 5 years ago

bedtools currently reports the output intervals in 0-based, half-open format (BED format).