Weeks-UNC / shapemapper2

Public repository for ShapeMapper 2 releases
Other
29 stars 16 forks source link

mate pairs not detected as paired. #32

Closed yoonsquared closed 1 year ago

yoonsquared commented 1 year ago

Hello, We love your program and wish to use it! I have ran my files which are named _R1.fq.gz and _R2.fq.gz, and am getting such message:

|BowtieAligner (RNA: LinearDesign1) (sample: Modified) output message: 
  |--------------------------------------------------------------------- 
  | 
  | 56405044 reads; of these:
  |   56405044 (100.00%) were unpaired; of these:
  |     3111176 (5.52%) aligned 0 times
  |     44743326 (79.33%) aligned exactly 1 time
  |     8550542 (15.16%) aligned >1 times
  | 94.48% overall alignment rate

You can see that the log is stating 100.00% of the were unpaired. However, when we run the Bowtie2 separately, we successfully achieve mate pairs.

Please let us know if there is a step in the trimming & interleaving that has a problem. We are in need of help and appreciate it greatly.

FYI: This happens even with the test data provided in the github.

Started ShapeMapper v2.1.3 at 2021-12-17 14:32:58
Running from directory: /home/user/shapemapper-2.1.3/test/data
args:  --name example --target TPP.fa --out TPP_shapemap --modified --R1 ./TPPplus/TPPplus_R1_aa.fastq.gz --R2 ./TPPplus/TPPplus_R2_aa.fastq.gz --untreated --R1 ./TPPminus/TPPminus_R1_aa.fastq.gz --R2 ./
TPPminus/TPPminus_R2_aa.fastq.gz --denatured --R1 TPPdenat_R1_aa.fastq.gz --R2 TPPdenat_R2_aa.fastq.gz
Created pipeline at 2021-12-17 14:32:58
Running FastaFormatChecker at 2021-12-17 14:32:58 . . .
. . . done at 2021-12-17 14:32:59
Running BowtieIndexBuilder at 2021-12-17 14:32:59 . . .
. . . done at 2021-12-17 14:32:59
Running process group 3 at 2021-12-17 14:32:59 . . .
  Including these components:
    ProgressMonitor . . . started at 2021-12-17 14:32:59
    QualityTrimmer1 . . . started at 2021-12-17 14:32:59
    QualityTrimmer2 . . . started at 2021-12-17 14:32:59
    Interleaver . . . started at 2021-12-17 14:32:59
    Merger . . . started at 2021-12-17 14:32:59
    LengthFilter . . . started at 2021-12-17 14:32:59
    BowtieAligner . . . started at 2021-12-17 14:32:59
    MutationParser_Modified . . . started at 2021-12-17 14:32:59
    MutationCounter_Modified . . . started at 2021-12-17 14:32:59
  /``````````````````````````````````````````````````````````````````````````````````````````````````
  |BowtieAligner (RNA: TPP) (sample: Modified) output message: 
  |----------------------------------------------------------- 
  | 
  | 44 reads; of these:
  |   44 (100.00%) were unpaired; of these:
  |     5 (11.36%) aligned 0 times
  |     39 (88.64%) aligned exactly 1 time
  |     0 (0.00%) aligned >1 times
  | 88.64% overall alignment rate

Thanks!

Best regards, Joon

yoonsquared commented 1 year ago

bowtie2_paired_checking_non-processed

Here is the output when I run the files separately through bowtie2.

Psirving commented 1 year ago

Hi, I don't get the same result with the example data. One thing I noticed is that your source directory is shapemapper-2.1.3. Are you running the most recent version? It should be shapemapper-2.1.5.

Psirving commented 1 year ago

By the way, there is a quality trimming step which can affect the percentage of reads paired. This is expected. With example data, you should see about 72% paired.

There is a shortcut to run the test data from the shapemapper-2.1.5 directory. Just type run-example.sh.

yoonsquared commented 1 year ago

Hi Patrick, Thanks for the reply, the run is by using run-example.sh and manual, however I think I am using the shapemapper-2.1.3 as you have mentioned. I will try to update to the recent version and let you know if that solves the problem! Thanks for the input. Be back soon.

Best, Joon

dinktnwo commented 1 year ago

@Psirving Hello Patrick, Our team has the exactly same problem while using shapemapper-2.1.5. We got only 14% paired results in shapemapper2 by default setting, however 100% paired from bowtie2.

I have no idea if those infomation should relevanted, I write here for clearness. Our experiment (IVT) used mixed genes to generate mixed fastq files, we just use one transcript id to run the shapemapper2. So the mapping rate would be lower. We would trim the adapter before running the shapemapper2 and would not add the --amplion to run. By the way, we did get the 72% paired rate from the example data.

Here is the result from bowtie2: 5330268 reads; of these: 5330268 (100.00%) were paired; of these: 5211986 (97.78%) aligned concordantly 0 times 118282 (2.22%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

5211986 pairs aligned concordantly 0 times; of these:
  371 (0.01%) aligned discordantly 1 time
----
5211615 pairs aligned 0 times concordantly or discordantly; of these:
  10423230 mates make up the pairs; of these:
    10411488 (99.89%) aligned 0 times
    11742 (0.11%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

2.34% overall alignment rate

And here is the result from shapemapper2: Set INTERLEAVED to true Set threads to 4 Writing mergable reads merged. Unspecified format for output stdout; defaulting to fastq. Unspecified format for output stdout; defaulting to fastq. Started output threads. Unspecified format for input stdin; defaulting to fastq. Total time: 87.470 seconds.
Pairs: 5330267
Joined: 4565427 85.651%
Ambiguous: 756670 14.196%
No Solution: 8170 0.153%
Too Short: 0 0.000%
Avg Insert: 163.9
Standard Deviation: 51.1
Mode: 144
Insert range: 35 - 289
90th percentile: 234
75th percentile: 199
50th percentile: 162
25th percentile: 128
10th percentile: 99
_____ /````````````````````````````````````````````````````````````````````````````` BowtieAligner (sample: Untreated) output message:
5330267 reads; of these:
764840 (14.35%) were paired; of these:
755138 (98.73%) aligned concordantly 0 times
9016 (1.18%) aligned concordantly exactly 1 time
686 (0.09%) aligned concordantly >1 times
----
755138 pairs aligned concordantly 0 times; of these:
350 (0.05%) aligned discordantly 1 time
----
754788 pairs aligned 0 times concordantly or discordantly; of these:
1509576 mates make up the pairs; of these:
1504681 (99.68%) aligned 0 times
4781 (0.32%) aligned exactly 1 time
114 (0.01%) aligned >1 times
4565427 (85.65%) were unpaired; of these:
4420639 (96.83%) aligned 0 times
142018 (3.11%) aligned exactly 1 time
2770 (0.06%) aligned >1 times
2.79% overall alignment rate

_____ . . . done at 2022-07-28 11:47:17 Running process group 5 at 2022-07-28 11:47:17 . . . Including these components: ProgressMonitor . . . started at 2022-07-28 11:47:17 QualityTrimmer1 . . . started at 2022-07-28 11:47:17 QualityTrimmer2 . . . started at 2022-07-28 11:47:17 Interleaver . . . started at 2022-07-28 11:47:17 Merger . . . started at 2022-07-28 11:47:17 Tab6Interleaver . . . started at 2022-07-28 11:47:17 BowtieAligner . . . started at 2022-07-28 11:47:17 SplitToFile1 . . . started at 2022-07-28 11:47:17 MutationParser_Denatured . . . started at 2022-07-28 11:47:17 MutationCounter_Denatured . . . started at 2022-07-28 11:47:17

Any idea to solve the problem that would be appriciated.

Best. Lee

Psirving commented 1 year ago

@dinktnwo, @yoonsquared I misunderstood this issue earlier, and misspoke about it. This is the intended behavior of ShapeMapper. The relevant difference between running Shapemapper and running Bowtie is the read merging step of Shapemapper. In the log dinktnwo provided, this is the message from the merging step:

| Pairs: 5330267
| Joined: 4565427 85.651%
| Ambiguous: 756670 14.196%
| No Solution: 8170 0.153%
| Too Short: 0 0.000%

The "Joined" reads are no longer "pairs", they are combined into single, unpaired reads. The "Ambiguous" and "No solution" reads remain as paired reads. If you add up these two categories, you get 14.35% paired, which is what is reported in the alignment step.

AMA-cs commented 1 year ago

Hi @Psirving, I'm a bit confused about the results I got. Is this is expected? Is there a range for good alignment rate when running the recent shapemapper2?

Merger (sample: Untreated) output message:
Pairs: 28505243
Joined: 20401056 71.569%
Ambiguous: 8082995 28.356%
No Solution: 21192 0.074%
Too Short: 0 0.000%
------------------------------------------
Avg Insert: 149.7
Standard Deviation: 61.7
Mode: 118
------------------------------------------
Insert range: 35 - 289
90th percentile: 242
75th percentile: 196
50th percentile: 141
25th percentile: 101
10th percentile: 73
BowtieAligner (sample: Untreated) output message:
28505243 reads; of these:
8104187 (28.43%) were paired; of these:
5102648 (62.96%) aligned concordantly 0 times
2991715 (36.92%) aligned concordantly exactly 1 time
9824 (0.12%) aligned concordantly >1 times
----
5102648 pairs aligned concordantly 0 times; of these:
127370 (2.50%) aligned discordantly 1 time
----
4975278 pairs aligned 0 times concordantly or discordantly; of these:
9950556 mates make up the pairs; of these:
9410018 (94.57%) aligned 0 times
513764 (5.16%) aligned exactly 1 time
26774 (0.27%) aligned >1 times
20401056 (71.57%) were unpaired; of these:
10328060 (50.63%) aligned 0 times
9919863 (48.62%) aligned exactly 1 time
153133 (0.75%) aligned >1 times
46.08% overall alignment rate