Closed D1Pan closed 1 year ago
Hello,
Here's my way of extracting the unique concordant reads. My command line is similar to yours but with the addition of the --no-unal
flag to filter out unaligned reads and --sam-nohead
to get rid of the SAM header.
$ ./bowtie2 -x zebra_fish -1 reads_1.fq -2 reads_2.fq --threads 4 --upto 100 --no-discordant --no-mixed --sam-nohead --no-unal -S out.sam
100 reads; of these:
100 (100.00%) were paired; of these:
64 (64.00%) aligned concordantly 0 times
27 (27.00%) aligned concordantly exactly 1 time
9 (9.00%) aligned concordantly >1 times
36.00% overall alignment rate
$ cat out.sam | grep -v XS:i | wc -l
54
Hello,
Here's my way of extracting the unique concordant reads. My command line is similar to yours but with the addition of the
--no-unal
flag to filter out unaligned reads and--sam-nohead
to get rid of the SAM header.$ ./bowtie2 -x zebra_fish -1 reads_1.fq -2 reads_2.fq --threads 4 --upto 100 --no-discordant --no-mixed --sam-nohead --no-unal -S out.sam 100 reads; of these: 100 (100.00%) were paired; of these: 64 (64.00%) aligned concordantly 0 times 27 (27.00%) aligned concordantly exactly 1 time 9 (9.00%) aligned concordantly >1 times 36.00% overall alignment rate $ cat out.sam | grep -v XS:i | wc -l 54
Hi, Thanks for your reply. I followed your suggestion and did this:
bowtie2 -x ./index -p 50 -t -1 ./R1.trimmed.fastq -2 ./R2.trimmed.fastq -S example.k1.2.sam --no-discordant --no-mixed --sam-nohead --no-unal
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:06:15
20826672 reads; of these:
20826672 (100.00%) were paired; of these:
15275909 (73.35%) aligned concordantly 0 times
5048396 (24.24%) aligned concordantly exactly 1 time
502367 (2.41%) aligned concordantly >1 times
26.65% overall alignment rate
Time searching: 00:06:15
Overall time: 00:06:15
And:
cat example.k1.2.sam | grep -v XS:i | wc -l
10119065
The end result is not the expected number, even odd. Any further suggestions?
Thanks again.
Regards, Sugar
We pushed a change to the bug_fixes
branch that would make the "unique concordant pair" count compatible with filters on the SAM output. Example:
{
m1=$0
getline
m2=$0
if (m1 ~ /YT:Z:CP/ && m2 ~ /YT:Z:CP/ && m1 !~ /XS:i/ && m2 !~ /XS:i/)
print m1"\n"m2
}
If I were to save the above AWK
script to a file unique.awk
and run the command-line below the output should now be twice "the unique concordant pairs" in the output summary, as expected.
./bowtie2-align-s -x index -1 reads_1.fq -2 reads_2.fq | awk -f unique.awk | wc -l
Please let us know your thoughts on these changes.
I appreciate your help. I understand the principle you mentioned, but it does not solve the problem perfectly. Perhaps the reason for the problem is in the printed output of the Bowtie2 visualization pipeline, rather than the reads filtering self. I've written an alternative solution, but it doesn't perfectly replicate the 'Aligned Concordantly Exactly 1 Time' value shown on Bowtie2's output display, like your suggestion; but the almost same principles/parameters.
I tried to extract reads-pairs aligned concordantly exactly 1 time from Bowtie2 (version 2.3.4.1 or high). This was fixed in Bowtie2's version a few years ago (the previous solution and another solution), but now the problem has resurfaced.
When I run:
The results would look like this:
What I want to separate and extract is the '5048169 (24.24%) aligned concordantly exactly 1 time' reads-pairs.
But when I followed the previous workarounds, I found:
1)
or
and result(s):
10196512
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.2)
or
and result(s):
10193259
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.foo.py
is (from Mr. Devon Ryan's solution):3)
or
and result(s):
12107870
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.4)
or samtools view -hf 0x2 example.sam | grep -v "XS:i:" | grep YT:Z:CP | wc -l or samtools view -hf 0x2 example.bam | grep -v "XS:i:" | grep YT:Z:CP | wc -l
and result(s):
10159819
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.I've tried almost every existing solution but to no avail. Urgently ask for your help.