arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
139 stars 86 forks source link

getfasta -s not working #140

Open malj390 opened 5 years ago

malj390 commented 5 years ago

Hello everyone,

I have a test.bed file with this structure:

chr4    74445406    74446534    AREG1   +
chr4    74446782    74449047    AREG2   +
chr20   57228229    57266628    BMP71   -
chr20   57202475    57228421    BMP72   -

I would like to get the fasta file so for that I am using bedtools getfasta like this:

bedtools getfasta -name -s -fullHeader -fi hg38.fa -bed test.bed -fo test.fasta

-s should give the proper strand but it doesn't

bedtools doesn't allow the "start and end" to be on the opposite way "end and start" so I reverse the position but I conserve the sign for the strand to get the reverse complement sequences in those genes that are in the negative strand.

Anyone knows what I am doing wrong or how to achieve the reverse complement obeying the strand?

I know how to make the reverse complement in Python and I can solve it, but I just don't want to add more unnecessary steps to the code, I would like to use the function that bedtools already has to solve this.

Thank you, Miguel

adrech commented 5 years ago

I was also running into this problem.

The sequence seems to get reversed correctly with a 6-column BEDfile (in your example there is no 'score' column).

Still, even with 6-column files, when -name and -s are used, getfasta appends (-) or (+) to the name which is not the expected behaviour (there should be no change to the name?), that is shown in the docs.

https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html#s-forcing-the-extracted-sequence-to-reflect-the-requested-strand

maximus-sci commented 4 years ago

I had this problem and it ended up being a problem with my bed file.

First, I'd recommend you insert a "score" column and just have a 1 on all rows. Next, make sure that the "end of line" character is formatted for unix (it should just be \n not \r\n). If you have the windows EOL character bedtools will not read the strand correctly.