alexdobin / STAR

RNA-seq aligner
MIT License
1.82k stars 501 forks source link

Soft clipping with long reads #1913

Open yagam-fluent opened 1 year ago

yagam-fluent commented 1 year ago

Hi Alex,

A general question about aligning long reads (ONT). I noticed some cases where STAR soft clips what seems to be a splice junction. See below two examples of STAR vs. minimap2, where an additional exon is detected. Could you please suggest what parameters can be tweaked to improve sensitivity?

STAR command: STAR-STAR_2.7.10b_alpha_230301/source/STAR --genomeDir /root/d/star_index/mm --readFilesIn /root/d/test/oxford_mouse_brain_full/standard_R2.fastq.gz /root/d/test/oxford_mouse_brain_full/standard_R1.fastq.gz --soloCBwhitelist /root/d/test/oxford_temp/barcodes/barcode_whitelist.txt --outFileNamePrefix /root/d/test/oxford_temp/starsolo/ --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --runThreadN 8 --alignIntronMin 20 --soloType CB_UMI_Simple --soloUMIdedup 1MM_CR --outFilterMultimapNmax 20 --soloCellFilter None --soloCellReadStats Standard --soloBarcodeMate 0 --soloCBmatchWLtype Exact --limitOutSJcollapsed 3000000 --outFilterMatchNminOverLread 0 --outFilterScoreMinOverLread 0 --soloFeatures GeneFull --outSAMtype BAM Unsorted --outSAMmode NoQS --outSAMreadID Number --soloMultiMappers Unique --readFilesCommand zcat --outSAMattributes HI NH UR CR gx --outFilterIntronMotifs RemoveNoncanonicalUnannotated

STAR: 83 16 chr7 103475741 255 16M1D148M1D94M654N104M266S 0 0 GCAGTGAAATAAATGCTTTTATTTGTCAGAAGACAGATTTTCAAATGTCTTTATCATTTTGCCAACAACTGACAGATGCTCTCTTGGGAACAATTAACCATTGTTCACAGGCAAGAGCAGGAAAGGGGATTTAGTGGTACTTGTGAGCCAGGGCAGCGGCCACTCAGCCACCACCTTCTGGAAGGCAGCCTGTGCAACAGGGGTGAAATCCTTGCCCAGGTGGTGGCCCAGCACAATCACGATCATATTGCCCAGGAGCCTGAAGTTCTCAGGATCCACATGCAGCTTGTCACAGTGGAGCTCACTGAGGCTGGCAAAGGTGCCCTTGAGGCTGTCCAAATGATTCAGGCCATCGTTAAGGGGCAGTTATCGCCTTTCTTGCCATGGGCCTTCACTTTGGCATTACCCATGATAGCAGAGGCAGAGGATAGGTCTCCAAAGCTATCAAAGTACCGCTGGGTCCAAGAGTAGACAACCAGCAGCCTGCCCAGGGCCTCACCACCAACTTCATCAGCGTTCACCTTTCCCCACAGGCCAGAGACAGCAGCCTTCTCAGCATCAGTCAGGTGCACCATGATGTCTGTTTCTGGGAGTTGTGAGTCAACACAACTATGTCAGAAGCAAATGT gx:Z:ENSMUSG00000052305 HI:i:1 CR:Z:ATCCACTGACCCAACC UR:Z:GTTTTCAGAATC

Minimap2: 83 16 chr7 103475741 60 16M1D148M1D94M654N101M1I11M1I111M113N108M1I36M * 0 0 GCAGTGAAATAAATGCTTTTATTTGTCAGAAGACAGATTTTCAAATGTCTTTATCATTTTGCCAACAACTGACAGATGCTCTCTTGGGAACAATTAACCATTGTTCACAGGCAAGAGCAGGAAAGGGGATTTAGTGGTACTTGTGAGCCAGGGCAGCGGCCACTCAGCCACCACCTTCTGGAAGGCAGCCTGTGCAACAGGGGTGAAATCCTTGCCCAGGTGGTGGCCCAGCACAATCACGATCATATTGCCCAGGAGCCTGAAGTTCTCAGGATCCACATGCAGCTTGTCACAGTGGAGCTCACTGAGGCTGGCAAAGGTGCCCTTGAGGCTGTCCAAATGATTCAGGCCATCGTTAAGGGGCAGTTATCGCCTTTCTTGCCATGGGCCTTCACTTTGGCATTACCCATGATAGCAGAGGCAGAGGATAGGTCTCCAAAGCTATCAAAGTACCGCTGGGTCCAAGAGTAGACAACCAGCAGCCTGCCCAGGGCCTCACCACCAACTTCATCAGCGTTCACCTTTCCCCACAGGCCAGAGACAGCAGCCTTCTCAGCATCAGTCAGGTGCACCATGATGTCTGTTTCTGGGAGTTGTGAGTCAACACAACTATGTCAGAAGCAAATGT =:999..?<==9;6544A<=<<<:9;55<88:::;<><>?@C@3333;;<6<?>?@C>89889;DB<<=@?::@A??<98{<9999E==>><<9889DABA<;2223;<03//++,,>781//)'')8:<=<;;:;<=?@<<87798;9333+***4565--666<<=;::9:9::AD98((():7751*&%&(,,/412459<;:;999;:566=;7778<<111366@@=84445;;:::;;<<;:9)(()6::=<;;<32545688889::<....=<>=>>==@=<:987++67----:=<<;=000//799:86345249::003*(((;<==<<;<@9777::(0)(+1--4--)&&&&,97;=>>>==><=;9>???>@A?=<9999:@:5333<<;;<==B74:;=?;;:86777;668<A@?>==340//2155:::;87879;<;:;;<8777>>?:999=:211/.**+=>=<>>>?:874455<>/---67=;;::=>?=;;;=5444>>:5696589=?A=<>??A@=;@?==<;9889:===<<===<74))()'&')+(((-+'''+0''--2{9889;<@B?=>=7225755 NM:i:13 ms:i:586 AS:i:518 nn:i:0 ts:A:+ tp:A:P cm:i:80 s1:i:512 s2:i:371 de:f:0.0206 rl:i:0

STAR: 10188 16 chr11 115914100 255 325S62M1D75M4I96M433S 0 0 AAAGGAAACTGTAAGTTACACTGTGGTTAAGACTTGTATCTTCACCCTTGAAAAAGCCCACATTCTATCACAGTGATGTATGGTCAGACTTAACAGCCCCAATTGTTAAACACTTGGATCAAGTCATAACCAGTTTTATTGCAAAAGGACCCTGTACACATTTATCAACTCTAGTACCTTAATAGCTACCCAACAAGTCATTAACATACAGAAACATACTCCATGAGAAGCAAGAAGTATCACCCATCCCTTCTGCATATTAGCAACTTGTCACTCCTGAGCAACAGTGCTCACATCACTGAGGCTCAGAACAGTCACTTTGTCTATCCTGAGTGAAAGATGGAATGACTTAAGTACAAATGCAACATATTATAAACAATTTCTTACAAAAAAGTCACAAATTAAACCAAAGTATTTTACAGAATTTACTACAAAACGCCATAAAAACTGCCTTCACTTAAGCTCTCTCTCTCTCCCCGTATCCGGCGAGCCAACTGGATGTCTTTGGGCATGATGGTGACTCTCTTAGCGTGGATGGCACACAGGTTGGTATCTTCGAACAGACCCACCAGGTACGCTTCGCTAGCCTCCTGCAGCGCACCGATGGCGCCAGCTTGAAACCTCAGGTCGGTTTTGAAATCCTGCGCGATCTCCCTCACCAACCTCTGAGGGGCAGCTTCCGGATGAGCAGCTCGGTCGACTTCGATAACGACGAATCTCTCAGCGCCACGGTCCCGGGCCTGTAGCGATGAGGCTTCTTCACCCCGCCGGTAGAGGGAGCGCTTTTCCTGGCGGCCGTACAGCCGCCTTGCGGGGGGCTTTCCCACCGGTGGACTTACGAGCAGTCTGCTTGGTTCGGGCCATTTTCTTCACCCAACGCCGAAGTTTTAGGCCACTTCTCCGACCGCCGCGCCGCTTCCGCTGCCCGAGGAAGAGCCGCAGTCGACGAGCGAAAAACCAGCACCGCCCAACGAACGACCAAACCGCTCTGCGGC gx:Z:ENSMUSG00000016559 HI:i:1 CR:Z:ATACATACAATTCCTC UR:Z:AATCCCTTATAG

minimap2: 10188 16 chr11 115913791 60 10M4I29M1D39M22I200M1D7M4D8M1D5M2D1M1D62M1D63M1I4M1D14M3I1M1I116M89N77M1D37M1D12M2D24M103N57M3D2M2D7M1I56M1D10M359N9M1I11M1I3M1D19M1D5M1I28M2I5M3D3M1D3M1D29M2S 0 0 AAAGGAAACTGTAAGTTACACTGTGGTTAAGACTTGTATCTTCACCCTTGAAAAAGCCCACATTCTATCACAGTGATGTATGGTCAGACTTAACAGCCCCAATTGTTAAACACTTGGATCAAGTCATAACCAGTTTTATTGCAAAAGGACCCTGTACACATTTATCAACTCTAGTACCTTAATAGCTACCCAACAAGTCATTAACATACAGAAACATACTCCATGAGAAGCAAGAAGTATCACCCATCCCTTCTGCATATTAGCAACTTGTCACTCCTGAGCAACAGTGCTCACATCACTGAGGCTCAGAACAGTCACTTTGTCTATCCTGAGTGAAAGATGGAATGACTTAAGTACAAATGCAACATATTATAAACAATTTCTTACAAAAAAGTCACAAATTAAACCAAAGTATTTTACAGAATTTACTACAAAACGCCATAAAAACTGCCTTCACTTAAGCTCTCTCTCTCTCCCCGTATCCGGCGAGCCAACTGGATGTCTTTGGGCATGATGGTGACTCTCTTAGCGTGGATGGCACACAGGTTGGTATCTTCGAACAGACCCACCAGGTACGCTTCGCTAGCCTCCTGCAGCGCACCGATGGCGCCAGCTTGAAACCTCAGGTCGGTTTTGAAATCCTGCGCGATCTCCCTCACCAACCTCTGAGGGGCAGCTTCCGGATGAGCAGCTCGGTCGACTTCGATAACGACGAATCTCTCAGCGCCACGGTCCCGGGCCTGTAGCGATGAGGCTTCTTCACCCCGCCGGTAGAGGGAGCGCTTTTCCTGGCGGCCGTACAGCCGCCTTGCGGGGGGCTTTCCCACCGGTGGACTTACGAGCAGTCTGCTTGGTTCGGGCCATTTTCTTCACCCAACGCCGAAGTTTTAGGCCACTTCTCCGACCGCCGCGCCGCTTCCGCTGCCCGAGGAAGAGCCGCAGTCGACGAGCGAAAAACCAGCACCGCCCAACGAACGACCAAACCGCTCTGCGGC ;768C@?>===>?CB==?>?@>5556A=8=<832234538=:79.+,4@=RG?;::==DC<?<<=>>>>;=:99556;=@;;87719;;;==<==??>???BCCCBA==4013344:=<9989;;>>AECBA@?<@@{E<99:887;===?@CDB==<;<<<=<==<=<=8:@@<;<<<<<=>=;;;;=BAA?>>=<==;:8-.4/%%$$%%%&'//8766?@=::;><;;>?@@@?=;;?>><==>?>>>?97=::54558<:;:;<<=>==<<;98:AA?><<;<:,+++((**)%%%&&((&&'())*8:;775123CB???<=<=?@=<<<<;<@ABD<<;<:::;@BAB==0/.-31252/.../466576:99:CDB:7::8==>6555E>?==><=9999;;::@@EI;54459;=ACA>@?>>?=>>>>=;98447{{{{{{{{4571+++55:888;>>>??@BA=423@@@AED@BA@ABDBAB=;==?@;{{761BA@B;:98;<>?@{D@@****7556<<9<32;===;5//.($%%%.0013875..--/-,,-6899@?;;251458/'%%$%%&%%%%';=:0122;=@?>>AAFC@@:885567BGBCDB@985/--0;:;B;?C4693,,@ADCCE@?=<==@A@;/.001899:B@;223GFC?@??CA?93{50(((2A@BC@B@?>>=<:::<CD@BDG@@A>:>??;6699:---BC=>?>A;;;A@A??EBCB93222>0+('%&'''+,++))-12339AE@>==>D@CCDEDEA?<;ACF>=>5545;;=BCA?@C955335;;;65-.9<<=<8422345<<<=A@?<:<A8877<98<<?877698<:;=1/..21.()''+,9=CB@?<ABCEFDCEEB@=>?DDEIB@>@DEEBBBBC??<=<=CBBAA@BFH:4/.//,++*),;< NM:i:138 ms:i:616 AS:i:520 nn:i:0 ts:A:+ tp:A:P cm:i:56 s1:i:432 s2:i:370 de:f:0.1025 rl:i:0

alexdobin commented 1 year ago

Hi @yagam-fluent

STAR does not map well reads with high rate of indels/mismatches. I would suggest minimap2 for ONT data.