BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
638 stars 160 forks source link

How to use -F option? #454

Open bede opened 6 months ago

bede commented 6 months ago

I am struggling to understand the useful looking -F option, which allows one to pass a fasta file from which k-mers are extracted and used as reads. I suspect I have misunderstood the manual:

-F k:<int>,i:<int>

    Reads are substrings (k-mers) extracted from a FASTA file <s>. Specifically, for every reference sequence in FASTA file <s>, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, ... until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like <sequence>_<offset>, where <sequence> is the name of the FASTA sequence it was drawn from and <offset> is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way. 

I have unsuccessfully tried, for example, the following:

$ bowtie2 -x NC_029549.1 -f NC_029549.1.fa -F k:150,i:1
FASTA and FASTA sampling formats are mutually exclusive.
(ERR): bowtie2-align exited with value 1

Might someone be able to provide an example of how this feature should be used?

Thank you!

BenLangmead commented 6 months ago

Thank you -- we are looking into this. I suspect the mention of <s> is a spurious hold-over from the Bowtie 1 manual, and that we should have said <r> -- which the variable we use in the Bowtie 2 manual to refer to the unpaired reads file specified with -U. We'll get a more definitive answer soon.

ch4rr0 commented 6 months ago

Hello,

Your command line was fine, with the exception that the k and i should be left out. I have pushed a fix to the bug_fixes branch that should resolve the mutually exclusive error thrown when -f was specified with -F.

@BenLangmead -- we updated the -f option to behave like -q in that it is simply a flag that specifies the format of the input files to follow. That way a user can do something like this:

bowtie2 -x index -f -1 mate1.fa -2 mate2.fa or bowtie2 -x index -q -1 mate1.fq -2 mate2.fq or bowtie2 -x index -f --interleaved input.fa or bowtie2 -x index -q --intereaved input.fa

In the case of FASTA-continuous this allows any one the following to be parsed the same way:

N.B. unpaired reads, -U, are default in bowtie2

bowtie2 -x index -f -F 10,2 input.fa # fasta explicit, unpaired inferred
bowtie2 -x index -F 10,2 input.fa # fasta and unpaired are inferred
bowtie2 -x index -F 10,2 -U input.fa # fasta is inferred, unpaired explicit
bowtie2 -x index -f -F 10,2 -U input.fa # all explicitly specified

I hope this makes sense.

bede commented 6 months ago

Speedy! Thank you both 🙏

Thanks for fixing the mutual exclusivity issue as well as with how I was using -F. The bug_fixes branch is now working as expected with bowtie2 -x NC_029549.1 -f NC_029549.1.fa -F 150,1