alexdobin / STAR

RNA-seq aligner
MIT License
1.84k stars 505 forks source link

Lots of "too short" (~70%) reads with example of a read found in unaligned. #297

Closed Hoohm closed 7 years ago

Hoohm commented 7 years ago

Hello,

I've been having some problems with a few runs of our single cell RNAseq data. We get a high percentage of unmapped reads. I've checked a couple of potential issues but it doesn't seem to solve them. 1) I've change the --sjdbOverhang to 48 (our reads are 49bp) --> no change 2) I'm using now --twopassMode Basic I might be missing some options here. There is an example of a sequence that I found in the unaligned tagged uT:A:1 CTTTAATTTTATTTGCTTATGTTACCATGCACAAAGGTTTCAGTTACTA

This sequence does map to mouse, here is a blast result:

# blastn
# Iteration: 0
# Query: 
# RID: RF0ZUBWK015
# Database: GPIPE/10090/current/all_top_level GPIPE/10090/current/rna
# Fields: query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 69 hits found
Query_136677    gi|372099099|ref|NC_000077.6|   Query_136677    NC_000077.6 100.000 22  0   0   2   23  4703227 4703206 0.031   41.0
Query_136677    gi|372099099|ref|NC_000077.6|   Query_136677    NC_000077.6 91.667  24  2   0   10  33  54738131    54738108    1.3 35.6
Query_136677    gi|372099099|ref|NC_000077.6|   Query_136677    NC_000077.6 91.304  23  2   0   8   30  6317732 6317754 4.6 33.7
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 89.655  29  3   0   7   35  112917515   112917543   0.11    39.2
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 91.667  24  2   0   4   27  114636584   114636561   1.3 35.6
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 95.455  22  1   0   2   23  123303056   123303035   1.3 35.6
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 91.304  23  2   0   2   24  5077940 5077918 4.6 33.7
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 85.714  28  4   0   19  46  44761139    44761112    4.6 33.7
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 91.304  23  2   0   1   23  98154135    98154157    4.6 33.7
Query_136677    gi|372099102|ref|NC_000074.6|   Query_136677    NC_000074.6 95.238  21  1   0   4   24  103330588   103330608   4.6 33.7
Query_136677    gi|372099100|ref|NC_000076.6|   Query_136677    NC_000076.6 93.333  30  0   2   7   34  104272523   104272552   0.11    39.2
Query_136677    gi|372099098|ref|NC_000078.6|   Query_136677    NC_000078.6 100.000 21  0   0   4   24  96560177    96560157    0.11    39.2
Query_136677    gi|372099098|ref|NC_000078.6|   Query_136677    NC_000078.6 91.304  23  2   0   2   24  19753249    19753271    4.6 33.7
Query_136677    gi|372099098|ref|NC_000078.6|   Query_136677    NC_000078.6 88.462  26  3   0   3   28  67795781    67795806    4.6 33.7
Query_136677    gi|372099105|ref|NC_000071.6|   Query_136677    NC_000071.6 95.652  23  1   0   2   24  147054695   147054673   0.38    37.4
Query_136677    gi|372099105|ref|NC_000071.6|   Query_136677    NC_000071.6 95.455  22  1   0   20  41  24209795    24209816    1.3 35.6
Query_136677    gi|372099090|ref|NC_000086.7|   Query_136677    NC_000086.7 95.652  23  1   0   1   23  111034174   111034152   0.38    37.4
Query_136677    gi|372099090|ref|NC_000086.7|   Query_136677    NC_000086.7 95.455  22  1   0   2   23  6113949 6113970 1.3 35.6
Query_136677    gi|372099090|ref|NC_000086.7|   Query_136677    NC_000086.7 88.462  26  3   0   5   30  92741350    92741325    4.6 33.7
Query_136677    gi|372099090|ref|NC_000086.7|   Query_136677    NC_000086.7 100.000 18  0   0   3   20  118049102   118049085   4.6 33.7
Query_136677    gi|372099109|ref|NC_000067.6|   Query_136677    NC_000067.6 91.667  24  2   0   2   25  86940434    86940457    1.3 35.6
Query_136677    gi|372099109|ref|NC_000067.6|   Query_136677    NC_000067.6 95.238  21  1   0   2   22  114593814   114593834   4.6 33.7
Query_136677    gi|372099104|ref|NC_000072.6|   Query_136677    NC_000072.6 87.097  31  3   1   7   37  72026617    72026588    1.3 35.6
Query_136677    gi|372099104|ref|NC_000072.6|   Query_136677    NC_000072.6 100.000 19  0   0   6   24  148160888   148160870   1.3 35.6
Query_136677    gi|372099104|ref|NC_000072.6|   Query_136677    NC_000072.6 95.238  21  1   0   6   26  80637277    80637257    4.6 33.7
Query_136677    gi|372099101|ref|NC_000075.6|   Query_136677    NC_000075.6 95.455  22  1   0   2   23  79699898    79699877    1.3 35.6
Query_136677    gi|372099101|ref|NC_000075.6|   Query_136677    NC_000075.6 88.462  26  3   0   3   28  10806376    10806401    4.6 33.7
Query_136677    gi|372099101|ref|NC_000075.6|   Query_136677    NC_000075.6 91.304  23  2   0   23  45  21428682    21428660    4.6 33.7
Query_136677    gi|372099101|ref|NC_000075.6|   Query_136677    NC_000075.6 100.000 18  0   0   11  28  41278064    41278081    4.6 33.7
Query_136677    gi|372099101|ref|NC_000075.6|   Query_136677    NC_000075.6 91.304  23  2   0   1   23  46088007    46087985    4.6 33.7
Query_136677    gi|372099094|ref|NC_000082.6|   Query_136677    NC_000082.6 100.000 19  0   0   2   20  36601982    36602000    1.3 35.6
Query_136677    gi|372099094|ref|NC_000082.6|   Query_136677    NC_000082.6 91.304  23  2   0   2   24  8018005 8017983 4.6 33.7
Query_136677    gi|372099094|ref|NC_000082.6|   Query_136677    NC_000082.6 91.304  23  2   0   2   24  79194752    79194730    4.6 33.7
Query_136677    gi|372099094|ref|NC_000082.6|   Query_136677    NC_000082.6 95.238  21  1   0   2   22  82550136    82550156    4.6 33.7
Query_136677    gi|372099093|ref|NC_000083.6|   Query_136677    NC_000083.6 91.667  24  2   0   4   27  18396283    18396260    1.3 35.6
Query_136677    gi|372099093|ref|NC_000083.6|   Query_136677    NC_000083.6 88.462  26  3   0   2   27  72235372    72235347    4.6 33.7
Query_136677    gi|372099092|ref|NC_000084.6|   Query_136677    NC_000084.6 88.889  27  3   0   2   28  16248779    16248805    1.3 35.6
Query_136677    gi|372099092|ref|NC_000084.6|   Query_136677    NC_000084.6 95.455  22  1   0   2   23  50749570    50749591    1.3 35.6
Query_136677    gi|372099092|ref|NC_000084.6|   Query_136677    NC_000084.6 91.304  23  2   0   18  40  17804901    17804879    4.6 33.7
Query_136677    gi|372099092|ref|NC_000084.6|   Query_136677    NC_000084.6 95.238  21  1   0   2   22  32696993    32697013    4.6 33.7
Query_136677    gi|253510046|ref|NT_166305.2|   Query_136677    NT_166305.2 100.000 19  0   0   6   24  4670329 4670311 1.3 35.6
Query_136677    gi|1039765473|ref|XM_017319815.1|   Query_136677    XM_017319815.1  91.304  23  2   0   21  43  8993    9015    4.6 33.7
Query_136677    gi|1039765471|ref|XM_017319814.1|   Query_136677    XM_017319814.1  91.304  23  2   0   21  43  7483    7505    4.6 33.7
Query_136677    gi|1039765470|ref|XM_011240303.2|   Query_136677    XM_011240303.2  91.304  23  2   0   21  43  9662    9684    4.6 33.7
Query_136677    gi|1039765469|ref|XM_011240304.2|   Query_136677    XM_011240304.2  91.304  23  2   0   21  43  7037    7059    4.6 33.7
Query_136677    gi|1039765468|ref|XM_011240301.2|   Query_136677    XM_011240301.2  91.304  23  2   0   21  43  7498    7520    4.6 33.7
Query_136677    gi|1039765467|ref|XM_011240300.2|   Query_136677    XM_011240300.2  91.304  23  2   0   21  43  7390    7412    4.6 33.7
Query_136677    gi|1039765466|ref|XM_011240299.2|   Query_136677    XM_011240299.2  91.304  23  2   0   21  43  8447    8469    4.6 33.7
Query_136677    gi|1039765465|ref|XM_011240298.2|   Query_136677    XM_011240298.2  91.304  23  2   0   21  43  7686    7708    4.6 33.7
Query_136677    gi|1039765463|ref|XM_017319813.1|   Query_136677    XM_017319813.1  91.304  23  2   0   21  43  7591    7613    4.6 33.7
Query_136677    gi|1039765462|ref|XM_011240297.2|   Query_136677    XM_011240297.2  91.304  23  2   0   21  43  7169    7191    4.6 33.7
Query_136677    gi|755504782|ref|XM_011240302.1|    Query_136677    XM_011240302.1  91.304  23  2   0   21  43  7112    7134    4.6 33.7
Query_136677    gi|755504770|ref|XM_011240296.1|    Query_136677    XM_011240296.1  91.304  23  2   0   21  43  7050    7072    4.6 33.7
Query_136677    gi|411147444|ref|NM_001271728.1|    Query_136677    NM_001271728.1  91.304  23  2   0   21  43  7138    7160    4.6 33.7
Query_136677    gi|411147442|ref|NM_030706.3|   Query_136677    NM_030706.3 91.304  23  2   0   21  43  7134    7156    4.6 33.7
Query_136677    gi|411147440|ref|NM_001271727.1|    Query_136677    NM_001271727.1  91.304  23  2   0   21  43  7340    7362    4.6 33.7
Query_136677    gi|411147438|ref|NM_001271726.1|    Query_136677    NM_001271726.1  91.304  23  2   0   21  43  7071    7093    4.6 33.7
Query_136677    gi|411147436|ref|NM_001271725.1|    Query_136677    NM_001271725.1  91.304  23  2   0   21  43  7035    7057    4.6 33.7
Query_136677    gi|372099108|ref|NC_000068.7|   Query_136677    NC_000068.7 100.000 18  0   0   1   18  11545364    11545347    4.6 33.7
Query_136677    gi|372099108|ref|NC_000068.7|   Query_136677    NC_000068.7 100.000 18  0   0   24  41  35919576    35919559    4.6 33.7
Query_136677    gi|372099108|ref|NC_000068.7|   Query_136677    NC_000068.7 95.238  21  1   0   4   24  71631129    71631109    4.6 33.7
Query_136677    gi|372099108|ref|NC_000068.7|   Query_136677    NC_000068.7 95.238  21  1   0   4   24  77819858    77819838    4.6 33.7
Query_136677    gi|372099108|ref|NC_000068.7|   Query_136677    NC_000068.7 100.000 18  0   0   6   23  135720516   135720499   4.6 33.7
Query_136677    gi|372099107|ref|NC_000069.6|   Query_136677    NC_000069.6 91.304  23  2   0   21  43  84160495    84160473    4.6 33.7
Query_136677    gi|372099106|ref|NC_000070.6|   Query_136677    NC_000070.6 100.000 18  0   0   2   19  13338877    13338860    4.6 33.7
Query_136677    gi|372099106|ref|NC_000070.6|   Query_136677    NC_000070.6 95.238  21  1   0   2   22  59219311    59219291    4.6 33.7
Query_136677    gi|372099103|ref|NC_000073.6|   Query_136677    NC_000073.6 100.000 18  0   0   9   26  44684211    44684194    4.6 33.7
Query_136677    gi|372099103|ref|NC_000073.6|   Query_136677    NC_000073.6 91.304  23  2   0   1   23  109353189   109353167   4.6 33.7
Query_136677    gi|372099096|ref|NC_000080.6|   Query_136677    NC_000080.6 91.304  23  2   0   2   24  115880197   115880175   4.6 33.7

Here is the Log.out

STAR version=STAR_2.5.3a_modified
STAR compilation time,server,dir=Thu Jun 29 12:15:36 EDT 2017 florence.cshl.edu:/sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/STAR.sandbox/source
##### DEFAULT parameters:
versionSTAR                       20201
versionGenome                     20101   20200   
parametersFiles                   -   
sysShell                          -
runMode                           alignReads
runThreadN                        1
runDirPerm                        User_RWX
runRNGseed                        777
genomeDir                         ./GenomeDir/
genomeLoad                        NoSharedMemory
genomeFastaFiles                  -   
genomeChainFiles                  -   
genomeSAindexNbases               14
genomeChrBinNbits                 18
genomeSAsparseD                   1
genomeSuffixLengthMax             18446744073709551615
genomeFileSizes                   0   
readFilesIn                       Read1   Read2   
readFilesCommand                  -   
readMatesLengthsIn                NotEqual
readMapNumber                     18446744073709551615
readNameSeparator                 /   
inputBAMfile                      -
bamRemoveDuplicatesType           -
bamRemoveDuplicatesMate2basesN    0
limitGenomeGenerateRAM            31000000000
limitIObufferSize                 150000000
limitOutSAMoneReadBytes           100000
limitOutSJcollapsed               1000000
limitOutSJoneRead                 1000
limitBAMsortRAM                   0
limitSjdbInsertNsj                1000000
outTmpDir                         -
outTmpKeep                        None
outStd                            Log
outReadsUnmapped                  None
outQSconversionAdd                0
outMultimapperOrder               Old_2.4
outSAMtype                        SAM   
outSAMmode                        Full
outSAMstrandField                 None
outSAMattributes                  Standard   
outSAMunmapped                    None   
outSAMorder                       Paired
outSAMprimaryFlag                 OneBestScore
outSAMreadID                      Standard
outSAMmapqUnique                  255
outSAMflagOR                      0
outSAMflagAND                     65535
outSAMattrRGline                  -   
outSAMheaderHD                    -   
outSAMheaderPG                    -   
outSAMheaderCommentFile           -
outBAMcompression                 1
outBAMsortingThreadN              0
outSAMfilter                      None   
outSAMmultNmax                    18446744073709551615
outSAMattrIHstart                 1
outSJfilterReads                  All
outSJfilterCountUniqueMin         3   1   1   1   
outSJfilterCountTotalMin          3   1   1   1   
outSJfilterOverhangMin            30   12   12   12   
outSJfilterDistToOtherSJmin       10   0   5   10   
outSJfilterIntronMaxVsReadN       50000   100000   200000   
outWigType                        None   
outWigStrand                      Stranded   
outWigReferencesPrefix            -
outWigNorm                        RPM   
outFilterType                     Normal
outFilterMultimapNmax             10
outFilterMultimapScoreRange       1
outFilterScoreMin                 0
outFilterScoreMinOverLread        0.66
outFilterMatchNmin                0
outFilterMatchNminOverLread       0.66
outFilterMismatchNmax             10
outFilterMismatchNoverLmax        0.3
outFilterMismatchNoverReadLmax    1
outFilterIntronMotifs             None
outFilterIntronStrands            RemoveInconsistentStrands
clip5pNbases                      0   
clip3pNbases                      0   
clip3pAfterAdapterNbases          0   
clip3pAdapterSeq                  -   
clip3pAdapterMMp                  0.1   
winBinNbits                       16
winAnchorDistNbins                9
winFlankNbins                     4
winAnchorMultimapNmax             50
winReadCoverageRelativeMin        0.5
winReadCoverageBasesMin           0
scoreGap                          0
scoreGapNoncan                    -8
scoreGapGCAG                      -4
scoreGapATAC                      -8
scoreStitchSJshift                1
scoreGenomicLengthLog2scale       -0.25
scoreDelBase                      -2
scoreDelOpen                      -2
scoreInsOpen                      -2
scoreInsBase                      -2
seedSearchLmax                    0
seedSearchStartLmax               50
seedSearchStartLmaxOverLread      1
seedPerReadNmax                   1000
seedPerWindowNmax                 50
seedNoneLociPerWindow             10
seedMultimapNmax                  10000
seedSplitMin                      12
alignIntronMin                    21
alignIntronMax                    0
alignMatesGapMax                  0
alignTranscriptsPerReadNmax       10000
alignSJoverhangMin                5
alignSJDBoverhangMin              3
alignSJstitchMismatchNmax         0   -1   0   0   
alignSplicedMateMapLmin           0
alignSplicedMateMapLminOverLmate    0.66
alignWindowsPerReadNmax           10000
alignTranscriptsPerWindowNmax     100
alignEndsType                     Local
alignSoftClipAtReferenceEnds      Yes
alignEndsProtrude                 0   ConcordantPair   
chimSegmentMin                    0
chimScoreMin                      0
chimScoreDropMax                  20
chimScoreSeparation               10
chimScoreJunctionNonGTAG          -1
chimMainSegmentMultNmax           10
chimJunctionOverhangMin           20
chimOutType                       SeparateSAMold   
chimFilter                        banGenomicN   
chimSegmentReadGapMax             0
sjdbFileChrStartEnd               -   
sjdbGTFfile                       -
sjdbGTFchrPrefix                  -
sjdbGTFfeatureExon                exon
sjdbGTFtagExonParentTranscript    transcript_id
sjdbGTFtagExonParentGene          gene_id
sjdbOverhang                      100
sjdbScore                         2
sjdbInsertSave                    Basic
quantMode                         -   
quantTranscriptomeBAMcompression    1
quantTranscriptomeBan             IndelSoftclipSingleend
twopass1readsN                    18446744073709551615
twopassMode                       None
##### Command Line:
/home/patrick/programs/STAR/bin/Linux_x86_64/STAR --genomeDir /home/patrick/big/references/mouse_noGTF/STAR_INDEX --sjdbGTFfile /home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf --readFilesCommand zcat --runThreadN 8 --outFilterMismatchNmax=4 --readFilesIn L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz --sjdbInsertSave All --outFileNamePrefix logs/L5-SCRB-Opt-3-E2. --outSAMunmapped Within --sjdbOverhang 48
##### Initial USER parameters from Command Line:
outFileNamePrefix                 logs/L5-SCRB-Opt-3-E2.
###### All USER parameters from Command Line:
genomeDir                     /home/patrick/big/references/mouse_noGTF/STAR_INDEX     ~RE-DEFINED
sjdbGTFfile                   /home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf     ~RE-DEFINED
readFilesCommand              zcat        ~RE-DEFINED
runThreadN                    8     ~RE-DEFINED
outFilterMismatchNmax         4     ~RE-DEFINED
readFilesIn                   L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz        ~RE-DEFINED
sjdbInsertSave                All     ~RE-DEFINED
outFileNamePrefix             logs/L5-SCRB-Opt-3-E2.     ~RE-DEFINED
outSAMunmapped                Within        ~RE-DEFINED
sjdbOverhang                  48     ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
runThreadN                        8
genomeDir                         /home/patrick/big/references/mouse_noGTF/STAR_INDEX
readFilesIn                       L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz   
readFilesCommand                  zcat   
outFileNamePrefix                 logs/L5-SCRB-Opt-3-E2.
outSAMunmapped                    Within   
outFilterMismatchNmax             4
sjdbGTFfile                       /home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf
sjdbOverhang                      48
sjdbInsertSave                    All

-------------------------------
##### Final effective command line:
/home/patrick/programs/STAR/bin/Linux_x86_64/STAR   --runThreadN 8   --genomeDir /home/patrick/big/references/mouse_noGTF/STAR_INDEX   --readFilesIn L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz      --readFilesCommand zcat      --outFileNamePrefix logs/L5-SCRB-Opt-3-E2.   --outSAMunmapped Within      --outFilterMismatchNmax 4   --sjdbGTFfile /home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf   --sjdbOverhang 48   --sjdbInsertSave All

##### Final parameters after user input--------------------------------:
versionSTAR                       20201
versionGenome                     20101   20200   
parametersFiles                   -   
sysShell                          -
runMode                           alignReads
runThreadN                        8
runDirPerm                        User_RWX
runRNGseed                        777
genomeDir                         /home/patrick/big/references/mouse_noGTF/STAR_INDEX
genomeLoad                        NoSharedMemory
genomeFastaFiles                  -   
genomeChainFiles                  -   
genomeSAindexNbases               14
genomeChrBinNbits                 18
genomeSAsparseD                   1
genomeSuffixLengthMax             18446744073709551615
genomeFileSizes                   0   
readFilesIn                       L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz   
readFilesCommand                  zcat   
readMatesLengthsIn                NotEqual
readMapNumber                     18446744073709551615
readNameSeparator                 /   
inputBAMfile                      -
bamRemoveDuplicatesType           -
bamRemoveDuplicatesMate2basesN    0
limitGenomeGenerateRAM            31000000000
limitIObufferSize                 150000000
limitOutSAMoneReadBytes           100000
limitOutSJcollapsed               1000000
limitOutSJoneRead                 1000
limitBAMsortRAM                   0
limitSjdbInsertNsj                1000000
outFileNamePrefix                 logs/L5-SCRB-Opt-3-E2.
outTmpDir                         -
outTmpKeep                        None
outStd                            Log
outReadsUnmapped                  None
outQSconversionAdd                0
outMultimapperOrder               Old_2.4
outSAMtype                        SAM   
outSAMmode                        Full
outSAMstrandField                 None
outSAMattributes                  Standard   
outSAMunmapped                    Within   
outSAMorder                       Paired
outSAMprimaryFlag                 OneBestScore
outSAMreadID                      Standard
outSAMmapqUnique                  255
outSAMflagOR                      0
outSAMflagAND                     65535
outSAMattrRGline                  -   
outSAMheaderHD                    -   
outSAMheaderPG                    -   
outSAMheaderCommentFile           -
outBAMcompression                 1
outBAMsortingThreadN              0
outSAMfilter                      None   
outSAMmultNmax                    18446744073709551615
outSAMattrIHstart                 1
outSJfilterReads                  All
outSJfilterCountUniqueMin         3   1   1   1   
outSJfilterCountTotalMin          3   1   1   1   
outSJfilterOverhangMin            30   12   12   12   
outSJfilterDistToOtherSJmin       10   0   5   10   
outSJfilterIntronMaxVsReadN       50000   100000   200000   
outWigType                        None   
outWigStrand                      Stranded   
outWigReferencesPrefix            -
outWigNorm                        RPM   
outFilterType                     Normal
outFilterMultimapNmax             10
outFilterMultimapScoreRange       1
outFilterScoreMin                 0
outFilterScoreMinOverLread        0.66
outFilterMatchNmin                0
outFilterMatchNminOverLread       0.66
outFilterMismatchNmax             4
outFilterMismatchNoverLmax        0.3
outFilterMismatchNoverReadLmax    1
outFilterIntronMotifs             None
outFilterIntronStrands            RemoveInconsistentStrands
clip5pNbases                      0   
clip3pNbases                      0   
clip3pAfterAdapterNbases          0   
clip3pAdapterSeq                  -   
clip3pAdapterMMp                  0.1   
winBinNbits                       16
winAnchorDistNbins                9
winFlankNbins                     4
winAnchorMultimapNmax             50
winReadCoverageRelativeMin        0.5
winReadCoverageBasesMin           0
scoreGap                          0
scoreGapNoncan                    -8
scoreGapGCAG                      -4
scoreGapATAC                      -8
scoreStitchSJshift                1
scoreGenomicLengthLog2scale       -0.25
scoreDelBase                      -2
scoreDelOpen                      -2
scoreInsOpen                      -2
scoreInsBase                      -2
seedSearchLmax                    0
seedSearchStartLmax               50
seedSearchStartLmaxOverLread      1
seedPerReadNmax                   1000
seedPerWindowNmax                 50
seedNoneLociPerWindow             10
seedMultimapNmax                  10000
seedSplitMin                      12
alignIntronMin                    21
alignIntronMax                    0
alignMatesGapMax                  0
alignTranscriptsPerReadNmax       10000
alignSJoverhangMin                5
alignSJDBoverhangMin              3
alignSJstitchMismatchNmax         0   -1   0   0   
alignSplicedMateMapLmin           0
alignSplicedMateMapLminOverLmate    0.66
alignWindowsPerReadNmax           10000
alignTranscriptsPerWindowNmax     100
alignEndsType                     Local
alignSoftClipAtReferenceEnds      Yes
alignEndsProtrude                 0   ConcordantPair   
chimSegmentMin                    0
chimScoreMin                      0
chimScoreDropMax                  20
chimScoreSeparation               10
chimScoreJunctionNonGTAG          -1
chimMainSegmentMultNmax           10
chimJunctionOverhangMin           20
chimOutType                       SeparateSAMold   
chimFilter                        banGenomicN   
chimSegmentReadGapMax             0
sjdbFileChrStartEnd               -   
sjdbGTFfile                       /home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf
sjdbGTFchrPrefix                  -
sjdbGTFfeatureExon                exon
sjdbGTFtagExonParentTranscript    transcript_id
sjdbGTFtagExonParentGene          gene_id
sjdbOverhang                      48
sjdbScore                         2
sjdbInsertSave                    All
quantMode                         -   
quantTranscriptomeBAMcompression    1
quantTranscriptomeBan             IndelSoftclipSingleend
twopass1readsN                    18446744073709551615
twopassMode                       None
----------------------------------------

   Input read files for mate 1, from input string L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz
-rw-rw-r-- 1 patrick patrick 1274159799 Jun 15 21:06 L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz

   readsCommandsFile:
exec > "logs/L5-SCRB-Opt-3-E2._STARtmp/tmp.fifo.read1"
echo FILE 0
zcat      "L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz"

Finished loading and checking parameters
Reading genome generation parameters:
versionGenome                 20201        ~RE-DEFINED
genomeFastaFiles              Mus_musculus.GRCm38.dna.primary_assembly.fa        ~RE-DEFINED
genomeSAindexNbases           14     ~RE-DEFINED
genomeChrBinNbits             18     ~RE-DEFINED
genomeSAsparseD               1     ~RE-DEFINED
sjdbOverhang                  0     ~RE-DEFINED
sjdbFileChrStartEnd           -        ~RE-DEFINED
sjdbGTFfile                   -     ~RE-DEFINED
sjdbGTFchrPrefix              -     ~RE-DEFINED
sjdbGTFfeatureExon            exon     ~RE-DEFINED
sjdbGTFtagExonParentTranscripttranscript_id     ~RE-DEFINED
sjdbGTFtagExonParentGene      gene_id     ~RE-DEFINED
sjdbInsertSave                Basic     ~RE-DEFINED
genomeFileSizes               2741239808   21885463878        ~RE-DEFINED
Genome version is compatible with current STAR version
Number of real (reference) chromosomes= 66
1   1   195471971   0
2   10  130694993   195559424
3   11  122082543   326369280
4   12  120129022   448528384
5   13  120421639   568852480
6   14  124902244   689438720
7   15  104043685   814481408
8   16  98207768    918552576
9   17  94987271    1016856576
10  18  90702639    1112014848
11  19  61431566    1202978816
12  2   182113224   1264582656
13  3   160039680   1446772736
14  4   156508116   1606942720
15  5   151834684   1763704832
16  6   149736546   1915748352
17  7   145441459   2065694720
18  8   129401213   2211184640
19  9   124595110   2340683776
20  MT  16299   2465464320
21  X   171031299   2465726464
22  Y   91744698    2636906496
23  JH584299.1  953012  2728656896
24  GL456233.1  336933  2729705472
25  JH584301.1  259875  2730229760
26  GL456211.1  241735  2730491904
27  GL456350.1  227966  2730754048
28  JH584293.1  207968  2731016192
29  GL456221.1  206961  2731278336
30  JH584297.1  205776  2731540480
31  JH584296.1  199368  2731802624
32  GL456354.1  195993  2732064768
33  JH584294.1  191905  2732326912
34  JH584298.1  184189  2732589056
35  JH584300.1  182347  2732851200
36  GL456219.1  175968  2733113344
37  GL456210.1  169725  2733375488
38  JH584303.1  158099  2733637632
39  JH584302.1  155838  2733899776
40  GL456212.1  153618  2734161920
41  JH584304.1  114452  2734424064
42  GL456379.1  72385   2734686208
43  GL456216.1  66673   2734948352
44  GL456393.1  55711   2735210496
45  GL456366.1  47073   2735472640
46  GL456367.1  42057   2735734784
47  GL456239.1  40056   2735996928
48  GL456213.1  39340   2736259072
49  GL456383.1  38659   2736521216
50  GL456385.1  35240   2736783360
51  GL456360.1  31704   2737045504
52  GL456378.1  31602   2737307648
53  GL456389.1  28772   2737569792
54  GL456372.1  28664   2737831936
55  GL456370.1  26764   2738094080
56  GL456381.1  25871   2738356224
57  GL456387.1  24685   2738618368
58  GL456390.1  24668   2738880512
59  GL456394.1  24323   2739142656
60  GL456392.1  23629   2739404800
61  GL456382.1  23158   2739666944
62  GL456359.1  22974   2739929088
63  GL456396.1  21240   2740191232
64  GL456368.1  20208   2740453376
65  JH584292.1  14945   2740715520
66  JH584295.1  1976    2740977664
Started loading the genome: Tue Jul 25 06:25:38 2017

Genome: size given as a parameter = 2741239808
SA: size given as a parameter = 21885463878
/SAindex: size given as a parameter = 1
Read from SAindex: genomeSAindexNbases=14  nSAi=357913940
nGenome=2741239808;  nSAbyte=21885463878
GstrandBit=32   SA number of indices=5305567000
Shared memory is not used for genomes. Allocated a private copy of the genome.
Genome file size: 2741239808 bytes; state: good=1 eof=0 fail=0 bad=0
Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 2741239808 bytes
SA file size: 21885463878 bytes; state: good=1 eof=0 fail=0 bad=0
Loading SA ... done! state: good=1 eof=0 fail=0 bad=0; loaded 21885463878 bytes
Loading SAindex ... done: 1565873619 bytes
Finished loading the genome: Tue Jul 25 06:28:25 2017

alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824
Jul 25 06:28:25 ..... processing annotations GTF
Processing sjdbGTFfile=/home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf, found:
        128717 transcripts
        769494 exons (non-collapsed)
        272530 collapsed junctions
Jul 25 06:28:35 ..... finished GTF processing
Jul 25 06:28:35   Loaded database junctions from the GTF file: /home/patrick/big/references/mouse_88/Mus_musculus.GRCm38.89.gtf: 272530 total junctions

WARNING: long repeat for junction # 185141 : 5 112774527 112775017; left shift = 255; right shift = 69
WARNING: long repeat for junction # 187493 : 5 123190235 123190960; left shift = 255; right shift = 19
WARNING: long repeat for junction # 196517 : 6 48754635 48755196; left shift = 81; right shift = 255
WARNING: long repeat for junction # 227873 : 7 141808196 141810304; left shift = 255; right shift = 255
Jul 25 06:28:36   Finished preparing junctions
Jul 25 06:28:36 ..... inserting junctions into the genome indices
Jul 25 06:29:03   Finished SA search: number of new junctions=272459, old junctions=0
Jul 25 06:29:16   Finished sorting SA indicesL nInd=52312078
Jul 25 06:30:18   Finished inserting junction indices
Jul 25 06:30:29   Finished SAi
Jul 25 06:30:30 ..... finished inserting junctions into genome
Writing 2767668331 bytes into logs/L5-SCRB-Opt-3-E2._STARgenome//Genome ; empty space on disk = 584456544256 bytes ... done
Writing 22101251200 bytes into logs/L5-SCRB-Opt-3-E2._STARgenome//SA ; empty space on disk = 581688868864 bytes ... done
Writing 8 bytes into logs/L5-SCRB-Opt-3-E2._STARgenome//SAindex ; empty space on disk = 559587606528 bytes ... done
Writing 120 bytes into logs/L5-SCRB-Opt-3-E2._STARgenome//SAindex ; empty space on disk = 559587606528 bytes ... done
Writing 1565873491 bytes into logs/L5-SCRB-Opt-3-E2._STARgenome//SAindex ; empty space on disk = 559587606528 bytes ... done
Created thread # 1
Created thread # 2
Created thread # 3
Created thread # 4
Created thread # 5
Starting to map file # 0
mate 1:   L5-SCRB-Opt-3-E2_tagged_unmapped.fastq.gz
Created thread # 6
Created thread # 7
Thread #3 end of input stream, nextChar=-1
Completed: thread #7
Completed: thread #4
Completed: thread #0
Completed: thread #5
Completed: thread #3
Completed: thread #2
Completed: thread #1
Joined thread # 1
Joined thread # 2
Joined thread # 3
Joined thread # 4
Joined thread # 5
Completed: thread #6
Joined thread # 6
Joined thread # 7
ALL DONE!

Don't really know what else I should try. The sequence maps on Zfp276 in the mouse Genome. Do you have any idea which parameters I should tweak to get those mappings? Thanks

alexdobin commented 7 years ago

Hi Patrick,

all the BLAST alignments with 100% identity are very short, <22b. I also mapped it with BLAT and it gives only one short hit: 22 2 23 49 100.0% 11 - 4703206 4703227 22 If you want to output such short alignments, you would need to reduce minimum score and mapped length requirements, e.g.: --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0

Cheers Alex

Hoohm commented 7 years ago

Hi,

thanks for the help. I tried the changes and I actually got ~10-20% more uniquely mapped reads. no more too short and the rest moved to multi mapping (which seems reasonable).

Thanks a lot!

aiqc commented 1 year ago

Applying --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0 made the too short reads map to multiple loci instead