gimelbrantlab / ASEReadCounter_star

Preprocessing sequencing data for allele-specific analysis
GNU General Public License v3.0
11 stars 5 forks source link

STAR alignment with paired-end reads #10

Open dstern opened 1 year ago

dstern commented 1 year ago

In instructions, you illustrate mapping SE reads. Are SE required for pipeline? I have tried mapping with PE reads and with these settings, and the two reads map to "different" locations, and so are not considered uniquely mapping. With --outFilterMultimapNmax 1, these are filtered out I guess. So, do you recommend increasing this parameter to 2, or will this break downstream analysis?

azurillandfriend commented 3 months ago

Hi, not the repository author here but I have used it with paired end reads and may be this might help: When you get to the stage of using alleleseq_merge_stream_v2.py you willl need to specify that you are using paired end reads using --paired 1 (in the Wiki tutorial they use --paired 0 for single end reads)

I did not have the issue of everything getting filtered out with --outFilterMultimapNmax1 with my reads. Here is an example of the Log.progress.out file from the STAR alignment step done with paired reads, showing over 90% uniquely mapped: Started job on | Jan 13 09:39:45 Started mapping on | Jan 13 09:41:43 Finished on | Jan 13 10:03:38 Mapping speed, Million of reads per hour | 90.47

                      Number of input reads |       33046509
                  Average input read length |       202
                                UNIQUE READS:
               Uniquely mapped reads number |       29832098
                    Uniquely mapped reads % |       90.27%
                      Average mapped length |       199.62
                   Number of splices: Total |       7112743
        Number of splices: Annotated (sjdb) |       6760665
                   Number of splices: GT/AG |       7027236
                   Number of splices: GC/AG |       15432
                   Number of splices: AT/AC |       983
           Number of splices: Non-canonical |       69092
                  Mismatch rate per base, % |       0.60%
                     Deletion rate per base |       0.12%
                    Deletion average length |       2.64
                    Insertion rate per base |       0.10%
                   Insertion average length |       2.29
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       0
         % of reads mapped to multiple loci |       0.00%
    Number of reads mapped to too many loci |       1579121
         % of reads mapped to too many loci |       4.78%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |       0.00%
             % of reads unmapped: too short |       4.93%
                 % of reads unmapped: other |       0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |       0
                        % of chimeric reads |       0.00%

How does the log file look in your case? Are you sure it is the program and not the data? How does it look if you map with something other than STAR?