Thank you for developing wonderful tool. I read the paper (https://www.nature.com/articles/s41467-021-22720-0) using bowtie2 for ATAC-seq read alignment, and find the question about pre-alignments. Does anyone know how to use pre-alignments? I don't know how to use the output of pre-alignments by bowtie2 as an input file for alignments.
Following sentence is their protocol.
Bowtie2 was applied for pre-alignments to filter out reads that align to repetitive regions using “-k 1 -D 20 -R 3 -N 1 -L 20 -i S,1,0.50 -X 2000 –rg-id” parameters. For the remaining reads, Bowtie2 was used to map to GRCh38 with “–very-sensitive -X 2000–rg-id” options.
I already tried following script, but 2nd step did not work.
Bowtie 2 version 2.4.2 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage:
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>]
<bt2-idx> Index filename prefix (minus trailing .X.bt2).
NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible.
<m1> Files with #1 mates, paired with files in <m2>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<m2> Files with #2 mates, paired with files in <m1>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<r> Files with unpaired reads.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<i> Files with interleaved paired-end FASTQ/FASTA reads
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<bam> Files are unaligned BAM sorted by read name.
<sam> File for SAM output (default: stdout)
<m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
specified many times. E.g. '-U file1.fq,file2.fq -U file3.fq'.
Options (defaults in parentheses):
Input:
-q query input files are FASTQ .fq/.fastq (default)
--tab5 query input files are TAB5 .tab5
--tab6 query input files are TAB6 .tab6
--qseq query input files are in Illumina's qseq format
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-F k:<int>,i:<int> query input files are continuous FASTA where reads
are substrings (k-mers) extracted from a FASTA file <s>
and aligned at offsets 1, 1+i, 1+2i ... end of reference
-c <m1>, <m2>, <r> are sequences themselves, not files
-s/--skip <int> skip the first <int> reads/pairs in the input (none)
-u/--upto <int> stop after first <int> reads/pairs (no limit)
-5/--trim5 <int> trim <int> bases from 5'/left end of reads (0)
-3/--trim3 <int> trim <int> bases from 3'/right end of reads (0)
--trim-to [3:|5:]<int> trim reads exceeding <int> bases from either 3' or 5' end
If the read end is not specified then it defaults to 3 (0)
--phred33 qualities are Phred+33 (default)
--phred64 qualities are Phred+64
--int-quals qualities encoded as space-delimited integers
Presets: Same as:
For --end-to-end:
--very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50
--fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50
--sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default)
--very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
For --local:
--very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00
--fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
--sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default)
--very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
Alignment:
-N <int> max # mismatches in seed alignment; can be 0 or 1 (0)
-L <int> length of seed substrings; must be >3, <32 (22)
-i <func> interval between seed substrings w/r/t read len (S,1,1.15)
--n-ceil <func> func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)
--dpad <int> include <int> extra ref chars on sides of DP table (15)
--gbar <int> disallow gaps within <int> nucs of read extremes (4)
--ignore-quals treat all quality values as 30 on Phred scale (off)
--nofw do not align forward (original) version of read (off)
--norc do not align reverse-complement version of read (off)
--no-1mm-upfront do not allow 1 mismatch alignments before attempting to
scan for the optimal seeded alignments
--end-to-end entire read must align; no clipping (on)
OR
--local local alignment; ends might be soft clipped (off)
Scoring:
--ma <int> match bonus (0 for --end-to-end, 2 for --local)
--mp <int> max penalty for mismatch; lower qual = lower penalty (6)
--np <int> penalty for non-A/C/G/Ts in read/ref (1)
--rdg <int>,<int> read gap open, extend penalties (5,3)
--rfg <int>,<int> reference gap open, extend penalties (5,3)
--score-min <func> min acceptable alignment score w/r/t read length
(G,20,8 for local, L,-0.6,-0.6 for end-to-end)
Reporting:
(default) look for multiple alignments, report best, with MAPQ
OR
-k <int> report up to <int> alns per read; MAPQ not meaningful
OR
-a/--all report all alignments; very slow, MAPQ not meaningful
Effort:
-D <int> give up extending after <int> failed extends in a row (15)
-R <int> for reads w/ repetitive seeds, try <int> sets of seeds (2)
Paired-end:
-I/--minins <int> minimum fragment length (0)
-X/--maxins <int> maximum fragment length (500)
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr)
--no-mixed suppress unpaired alignments for paired reads
--no-discordant suppress discordant alignments for paired reads
--dovetail concordant when mates extend past each other
--no-contain not concordant when one mate alignment contains other
--no-overlap not concordant when mates overlap at all
BAM:
--align-paired-reads
Bowtie2 will, by default, attempt to align unpaired BAM reads.
Use this option to align paired-end reads instead.
--preserve-tags Preserve tags from the original BAM record by
appending them to the end of the corresponding SAM output.
Output:
-t/--time print wall-clock time taken by search phases
--un <path> write unpaired reads that didn't align to <path>
--al <path> write unpaired reads that aligned at least once to <path>
--un-conc <path> write pairs that didn't align concordantly to <path>
--al-conc <path> write pairs that aligned concordantly at least once to <path>
(Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
--un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
--quiet print nothing to stderr except serious errors
--met-file <path> send metrics to file at <path> (off)
--met-stderr send metrics to stderr (off)
--met <int> report internal counters & metrics every <int> secs (1)
--no-unal suppress SAM records for unaligned reads
--no-head suppress header lines, i.e. lines starting with @
--no-sq suppress @SQ header lines
--rg-id <text> set read group id, reflected in @RG line and RG:Z: opt field
--rg <text> add <text> ("lab:value") to @RG line of SAM header.
Note: @RG line only printed when --rg-id is set.
--omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments.
--sam-no-qname-trunc
Suppress standard behavior of truncating readname at first whitespace
at the expense of generating non-standard SAM.
--xeq Use '='/'X', instead of 'M,' to specify matches/mismatches in SAM record.
--soft-clipped-unmapped-tlen
Exclude soft-clipped bases when reporting TLEN
--sam-append-comment
Append FASTA/FASTQ comment to SAM record
Performance:
-p/--threads <int> number of alignment threads to launch (1)
--reorder force SAM output order to match order of input reads
--mm use memory-mapped I/O for index; many 'bowtie's can share
Other:
--qc-filter filter out reads that are bad according to QSEQ filter
--seed <int> seed for random number generator (0)
--non-deterministic
seed rand. gen. arbitrarily instead of using read attributes
--version print version information and quit
-h/--help print this usage message
***
Error: Must specify at least one read input with -U/-1/-2
(ERR): bowtie2-align exited with value 1
Thank you for developing wonderful tool. I read the paper (https://www.nature.com/articles/s41467-021-22720-0) using bowtie2 for ATAC-seq read alignment, and find the question about pre-alignments. Does anyone know how to use pre-alignments? I don't know how to use the output of pre-alignments by bowtie2 as an input file for alignments. Following sentence is their protocol.
Bowtie2 was applied for pre-alignments to filter out reads that align to repetitive regions using “-k 1 -D 20 -R 3 -N 1 -L 20 -i S,1,0.50 -X 2000 –rg-id” parameters. For the remaining reads, Bowtie2 was used to map to GRCh38 with “–very-sensitive -X 2000–rg-id” options.
I already tried following script, but 2nd step did not work.
Thanks in advance