arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic
MIT License
928 stars 287 forks source link

something wrong in bedtools getfasta -name #805

Open StayHungryStayFool opened 4 years ago

StayHungryStayFool commented 4 years ago

BEDtools version 2.26.0. Example as follow: mm10-NlaIII.bed: chr1 0 3000185 HIC_chr1_1@chr1:0-3000185 chr1 3000185 3000316 HIC_chr1_2@chr1:3000185-3000316 chr1 3000316 3000850 HIC_chr1_3@chr1:3000316-3000850 chr1 3000850 3001659 HIC_chr1_4@chr1:3000850-3001659

Code one with parameter -name: bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed -name | fold -w 70 Result fastq header:

HIC_chr1_7512::chr1:4664567-4666090 HIC_chr1_7511::chr1:4664466-4664567

Code one without parameter -name: bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed | fold -w 70 Result fastq header:

chr1:5006220-5006371 chr1:5006142-5006220

Can you help me with this question? Best wish.

Acribbs commented 4 years ago

I find exactly the same thing with bedtools v2.29.2, seems like the functionality of --name has changed. Is this now expected behaviour that fastq header should be e.g.:

chr1.tRNA1-ValCAC-::chr1:16725515-16725688(+)

instead of what is used to be (can't now remember the version of the old software I was using):

chr1.tRNA1-ValCAC-(+)

Zhuxitong commented 4 years ago

It's true that the -name function has been changed since 2.26.0. The output is not same like what said in the getfasta doc:

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10 myseq

$ bedtools getfasta -fi test.fa -bed test.bed -name
>myseq
AAACC

I am using 2.25.0 and it works like above. I really think the old one is what exactlly we need. However, I am not sure it's a bug or intentional, if the former, please fix it.

Sincerly thanks!

NielInfante commented 3 years ago

It can be fixed by piping through sed:

$ bedtools getfasta -fi test.fa -bed test.bed -name | sed 's/::.*//'

but I would prefer not to have to do the extra step.

DKoenemann commented 4 months ago

It seems that at least as of v.2.30.0 there are new flags

-name now gives the name and coordinates together

-nameOnly does what -name used to do and gives just the name indicated in the .bed

See the man page: https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html