COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
780 stars 165 forks source link

Consistent salmon quant segfault #271

Open scottx611x opened 6 years ago

scottx611x commented 6 years ago

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)? salmon

Describe the bug Running salmon quant through my SLURM cluster consistently segfaults. I've attempted runs on m4.2xlarge & m4.8xlarge worker nodes.

Aug 16 19:38:23 ip-172-31-30-93 kernel: [ 681.083866] salmon[4167]: segfault at 2641a ip 00007fe2fcdc2dca sp 00007fff27128b90 error 4 in libtbb.so.2[7fe2fcda0000+37000]

To Reproduce

Expected behavior For salmon quant to run to completion

Desktop (please complete the following information):

Ubuntu Linux
Linux ip-172-31-24-127.ec2.internal 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

Additional context

Terminal Output

Example Output
``` Fatal error: Exit code 139 () Version Info: ### A newer version of Salmon is available. #### ### The newest version, available at https://github.com/COMBINE-lab/salmon/releases contains new features, improvements, and bug fixes; please upgrade at your earliest convenience. ### [2018-08-16 19:42:27.806] [jLog] [info] building index RapMap Indexer [Step 1 of 4] : counting k-mers [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000434970.2], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000448914.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000415118.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632684.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631435.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000430425.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000450276.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000431870.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390567.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390580.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000437320.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000431440.2], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390574.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390572.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390569.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390588.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000454691.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000452198.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000414852.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390575.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000439842.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390581.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000454908.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000451044.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632542.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632524.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633009.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632968.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633968.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631871.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634154.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631895.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632859.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633159.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632963.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604838.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604446.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000605284.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633210.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633010.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631884.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632619.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634070.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632304.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633030.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000603693.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634085.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604642.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000603326.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:28.327] [jointLog] [warning] Entry with header [ENST00000437226.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:29.161] [jointLog] [warning] Entry with header [ENST00000632054.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.055] [jointLog] [warning] Entry with header [ENST00000518246.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.060] [jointLog] [warning] Entry with header [ENST00000632342.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.437] [jointLog] [warning] Entry with header [ENST00000603775.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.438] [jointLog] [warning] Entry with header [ENST00000473810.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.543] [jointLog] [warning] Entry with header [ENST00000543745.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.703] [jointLog] [warning] Entry with header [ENST00000579054.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.959] [jointLog] [warning] Entry with header [ENST00000634174.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) [2018-08-16 19:42:44.973] [jointLog] [warning] Entry with header [ENST00000573437.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping) Elapsed time: 17.1995s [2018-08-16 19:42:45.008] [jointLog] [warning] Removed 11768 transcripts that were sequence duplicates of indexed transcripts. [2018-08-16 19:42:45.008] [jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag Replaced 5 non-ATCG nucleotides Clipped poly-A tails from 1453 transcripts Building rank-select dictionary and saving to disk done Elapsed time: 0.0193769s Writing sequence data to file . . . done Elapsed time: 0.138102s [info] Building 32-bit suffix array (length of generalized text is 289267207) Building suffix array . . . success saving to disk . . . done Elapsed time: 0.595015s done Elapsed time: 34.8393s processed 0 positions processed 1000000 positions processed 2000000 positions processed 3000000 positions processed 4000000 positions processed 5000000 positions processed 6000000 positions processed 7000000 positions processed 8000000 positions processed 9000000 positions processed 10000000 positions processed 11000000 positions processed 12000000 positions processed 13000000 positions processed 14000000 positions processed 15000000 positions processed 16000000 positions processed 17000000 positions processed 18000000 positions processed 19000000 positions processed 20000000 positions processed 21000000 positions processed 22000000 positions processed 23000000 positions processed 24000000 positions processed 25000000 positions processed 26000000 positions processed 27000000 positions processed 28000000 positions processed 29000000 positions processed 30000000 positions processed 31000000 positions processed 32000000 positions processed 33000000 positions processed 34000000 positions processed 35000000 positions processed 36000000 positions processed 37000000 positions processed 38000000 positions processed 39000000 positions processed 40000000 positions processed 41000000 positions processed 42000000 positions processed 43000000 positions processed 44000000 positions processed 45000000 positions processed 46000000 positions processed 47000000 positions processed 48000000 positions processed 49000000 positions processed 50000000 positions processed 51000000 positions processed 52000000 positions processed 53000000 positions processed 54000000 positions processed 55000000 positions processed 56000000 positions processed 57000000 positions processed 58000000 positions processed 59000000 positions processed 60000000 positions processed 61000000 positions processed 62000000 positions processed 63000000 positions processed 64000000 positions processed 65000000 positions processed 66000000 positions processed 67000000 positions processed 68000000 positions processed 69000000 positions processed 70000000 positions processed 71000000 positions processed 72000000 positions processed 73000000 positions processed 74000000 positions processed 75000000 positions processed 76000000 positions processed 77000000 positions processed 78000000 positions processed 79000000 positions processed 80000000 positions processed 81000000 positions processed 82000000 positions processed 83000000 positions processed 84000000 positions processed 85000000 positions processed 86000000 positions processed 87000000 positions processed 88000000 positions processed 89000000 positions processed 90000000 positions processed 91000000 positions processed 92000000 positions processed 93000000 positions processed 94000000 positions processed 95000000 positions processed 96000000 positions processed 97000000 positions processed 98000000 positions processed 99000000 positions processed 100000000 positions processed 101000000 positions processed 102000000 positions processed 103000000 positions processed 104000000 positions processed 105000000 positions processed 106000000 positions processed 107000000 positions processed 108000000 positions processed 109000000 positions processed 110000000 positions processed 111000000 positions processed 112000000 positions processed 113000000 positions processed 114000000 positions processed 115000000 positions processed 116000000 positions processed 117000000 positions processed 118000000 positions processed 119000000 positions processed 120000000 positions processed 121000000 positions processed 122000000 positions processed 123000000 positions processed 124000000 positions processed 125000000 positions processed 126000000 positions processed 127000000 positions processed 128000000 positions processed 129000000 positions processed 130000000 positions processed 131000000 positions processed 132000000 positions processed 133000000 positions processed 134000000 positions processed 135000000 positions processed 136000000 positions processed 137000000 positions processed 138000000 positions processed 139000000 positions processed 140000000 positions processed 141000000 positions processed 142000000 positions processed 143000000 positions processed 144000000 positions processed 145000000 positions processed 146000000 positions processed 147000000 positions processed 148000000 positions processed 149000000 positions processed 150000000 positions processed 151000000 positions processed 152000000 positions processed 153000000 positions processed 154000000 positions processed 155000000 positions processed 156000000 positions processed 157000000 positions processed 158000000 positions processed 159000000 positions processed 160000000 positions processed 161000000 positions processed 162000000 positions processed 163000000 positions processed 164000000 positions processed 165000000 positions processed 166000000 positions processed 167000000 positions processed 168000000 positions processed 169000000 positions processed 170000000 positions processed 171000000 positions processed 172000000 positions processed 173000000 positions processed 174000000 positions processed 175000000 positions processed 176000000 positions processed 177000000 positions processed 178000000 positions processed 179000000 positions processed 180000000 positions processed 181000000 positions processed 182000000 positions processed 183000000 positions processed 184000000 positions processed 185000000 positions processed 186000000 positions processed 187000000 positions processed 188000000 positions processed 189000000 positions processed 190000000 positions processed 191000000 positions processed 192000000 positions processed 193000000 positions processed 194000000 positions processed 195000000 positions processed 196000000 positions processed 197000000 positions processed 198000000 positions processed 199000000 positions processed 200000000 positions processed 201000000 positions processed 202000000 positions processed 203000000 positions processed 204000000 positions processed 205000000 positions processed 206000000 positions processed 207000000 positions processed 208000000 positions processed 209000000 positions processed 210000000 positions processed 211000000 positions processed 212000000 positions processed 213000000 positions processed 214000000 positions processed 215000000 positions processed 216000000 positions processed 217000000 positions processed 218000000 positions processed 219000000 positions processed 220000000 positions processed 221000000 positions processed 222000000 positions processed 223000000 positions processed 224000000 positions processed 225000000 positions processed 226000000 positions processed 227000000 positions processed 228000000 positions processed 229000000 positions processed 230000000 positions processed 231000000 positions processed 232000000 positions processed 233000000 positions processed 234000000 positions processed 235000000 positions processed 236000000 positions processed 237000000 positions processed 238000000 positions processed 239000000 positions processed 240000000 positions processed 241000000 positions processed 242000000 positions processed 243000000 positions processed 244000000 positions processed 245000000 positions processed 246000000 positions processed 247000000 positions processed 248000000 positions processed 249000000 positions processed 250000000 positions processed 251000000 positions processed 252000000 positions processed 253000000 positions processed 254000000 positions processed 255000000 positions processed 256000000 positions processed 257000000 positions processed 258000000 positions processed 259000000 positions processed 260000000 positions processed 261000000 positions processed 262000000 positions processed 263000000 positions processed 264000000 positions processed 265000000 positions processed 266000000 positions processed 267000000 positions processed 268000000 positions processed 269000000 positions processed 270000000 positions processed 271000000 positions processed 272000000 positions processed 273000000 positions processed 274000000 positions processed 275000000 positions processed 276000000 positions processed 277000000 positions processed 278000000 positions processed 279000000 positions processed 280000000 positions processed 281000000 positions processed 282000000 positions processed 283000000 positions processed 284000000 positions processed 285000000 positions processed 286000000 positions processed 287000000 positions processed 288000000 positions processed 289000000 positions khash had 109134690 keys saving hash to disk . . . done Elapsed time: 7.61947s [2018-08-16 19:47:14.359] [jLog] [info] done building index Version Info: ### A newer version of Salmon is available. #### ### The newest version, available at https://github.com/COMBINE-lab/salmon/releases contains new features, improvements, and bug fixes; please upgrade at your earliest convenience. ### ### salmon (mapping-based) v0.9.1 ### [ program ] => salmon ### [ command ] => quant ### [ index ] => { ./index } ### [ libType ] => { U } ### [ unmatedReads ] => { ./single.fastq } ### [ output ] => { ./output } ### [ allowOrphansFMD ] => { } ### [ threads ] => { 16 } ### [ incompatPrior ] => { 1e-20 } ### [ biasSpeedSamp ] => { 1 } ### [ fldMax ] => { 1000 } ### [ fldMean ] => { 200 } ### [ fldSD ] => { 80 } ### [ forgettingFactor ] => { 0.65 } ### [ maxOcc ] => { 200 } ### [ maxReadOcc ] => { 100 } ### [ numBiasSamples ] => { 2000000 } ### [ numAuxModelSamples ] => { 5000000 } ### [ numPreAuxModelSamples ] => { 1000000 } ### [ numGibbsSamples ] => { 0 } ### [ numBootstraps ] => { 0 } ### [ vbPrior ] => { 0.001 } Logs will be written to ./output/logs [2018-08-16 19:47:14.418] [jointLog] [info] parsing read library format [2018-08-16 19:47:14.418] [jointLog] [info] There is 1 library. [2018-08-16 19:47:14.460] [stderrLog] [info] Loading Suffix Array [2018-08-16 19:47:14.459] [jointLog] [info] Loading Quasi index [2018-08-16 19:47:14.459] [jointLog] [info] Loading 32-bit quasi index [2018-08-16 19:47:15.044] [stderrLog] [info] Loading Transcript Info [2018-08-16 19:47:15.207] [stderrLog] [info] Loading Rank-Select Bit Array [2018-08-16 19:47:15.263] [stderrLog] [info] There were 173531 set bits in the bit array [2018-08-16 19:47:15.285] [stderrLog] [info] Computing transcript lengths [2018-08-16 19:47:15.285] [stderrLog] [info] Waiting to finish loading hash [2018-08-16 19:47:20.808] [jointLog] [info] done [2018-08-16 19:47:20.808] [jointLog] [info] Index contained 173531 targets [2018-08-16 19:47:20.808] [stderrLog] [info] Done loading index  processed 500002 fragments hits: 2213374; hits per frag: 5.08859 processed 1000002 fragments hits: 4422312; hits per frag: 4.78092 processed 1500006 fragments hits: 6635818; hits per frag: 4.69843 processed 2000001 fragments hits: 8846970; hits per frag: 4.55737 processed 2500021 fragments hits: 11062734; hits per frag: 4.49592 processed 3000000 fragments hits: 13274990; hits per frag: 4.48667 processed 3500002 fragments hits: 15430043; hits per frag: 4.48414 processed 4000004 fragments hits: 17638270; hits per frag: 4.48376 processed 4500000 fragments hits: 19856371; hits per frag: 4.45983 processed 5000000 fragments hits: 22066072; hits per frag: 4.44139 processed 5500001 fragments hits: 24279605; hits per frag: 4.45227 processed 6000001 fragments hits: 26487237; hits per frag: 4.44903 processed 6500001 fragments hits: 28700681; hits per frag: 4.47247 processed 7000002 fragments hits: 30906396; hits per frag: 4.43833 processed 7500000 fragments hits: 33126825; hits per frag: 4.4543 processed 8000002 fragments hits: 35330889; hits per frag: 4.45163 processed 8500003 fragments hits: 37539646; hits per frag: 4.44689 processed 9000000 fragments hits: 39750282; hits per frag: 4.44409 processed 9500002 fragments hits: 41961815; hits per frag: 4.44042 processed 10000000 fragments hits: 44170718; hits per frag: 4.45236 processed 10500000 fragments hits: 46371950; hits per frag: 4.43688 processed 11000000 fragments hits: 48583257; hits per frag: 4.43742 processed 11500000 fragments hits: 50786734; hits per frag: 4.42839 processed 12000000 fragments hits: 52997209; hits per frag: 4.44023 processed 12500001 fragments hits: 55208614; hits per frag: 4.44032 processed 13000001 fragments hits: 57414177; hits per frag: 4.42985 processed 13500000 fragments hits: 59628726; hits per frag: 4.43762 processed 14000000 fragments hits: 61807863; hits per frag: 4.42765 processed 14500001 fragments hits: 64017382; hits per frag: 4.4294 processed 15000000 fragments hits: 66225532; hits per frag: 4.42625 processed 15500006 fragments hits: 68431333; hits per frag: 4.42688 processed 16000002 fragments hits: 70643320; hits per frag: 4.44249 processed 16500002 fragments hits: 72850859; hits per frag: 4.42435 processed 17500001 fragments hits: 77275281; hits per frag: 4.43251 processed 18000005 fragments hits: 79494713; hits per frag: 4.433 processed 18500000 fragments hits: 81710387; hits per frag: 4.43219 processed 19000001 fragments hits: 83924289; hits per frag: 4.42804 processed 19500000 fragments hits: 86134985; hits per frag: 4.43392 processed 20000000 fragments hits: 88347210; hits per frag: 4.42895 processed 20500003 fragments hits: 90559781; hits per frag: 4.43081 processed 21000000 fragments hits: 92771131; hits per frag: 4.42429 [2018-08-16 19:47:49.632] [jointLog] [info] Computed 260771 rich equivalence classes for further processing [2018-08-16 19:47:49.632] [jointLog] [info] Counted 19352476 total reads in the equivalence classes [2018-08-16 19:47:49.646] [jointLog] [info] Mapping rate = 91.4764% [2018-08-16 19:47:49.646] [jointLog] [info] finished quantifyLibrary() [2018-08-16 19:47:49.649] [jointLog] [info] Starting optimizer /mnt/galaxy/tmp/job_working_directory/000/900/tool_script.sh: line 50: 5733 Segmentation fault (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxOcc 200 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --vbPrior 0.001 ```
scottx611x commented 6 years ago

EDIT: 8-24-18 I haven't been able to reproduce the segfault outside of SLURM
rob-p commented 6 years ago

Hi scott,

Thank you for the detailed report. Im trying to reproduce the issue. So far, i have been unable to reproduce the issue on an ubuntu 16.04 or OSX box with either 0.11.1 or 0.9.1. My next test is to try on an ubuntu 14.04 docker container. I'm afraid there may be a system library issue involved. Could you try upgrading via bioconda as well to see if that helps? The latest linux release is available on bioconda.

scottx611x commented 6 years ago

@rob-p Thanks for your quick reply! I'll try this out with a more recent conda installation of salmon and report back

scottx611x commented 6 years ago

I've created a new conda environment based off of salmon==0.11.2 and was able to run it successfully outside of Galaxy/SLURM on the same 14.04 instance.

I had to omit the --sasamp and --maxOcc options that had been utilized in 0.9.1since they seem to not exist with the newer version.

scottx611x commented 6 years ago

@rob-p I've taken the time to update salmon to 0.11.2 in it's respective Galaxy Tool wrapper and am still seeing the salmon quant segfault when running through SLURM.

bioconda installs of salmon 0.9.1 & 0.11.2 run to completion outside of SLURM on the same machine.

I've seen that #268 was opened and closed recently, but I don't have the liberty to resolve the salmon dependency outside of conda (at least very easily/in a timely fashion).

Update: Have since filed https://github.com/bioconda/bioconda-recipes/issues/10662

bgruening commented 6 years ago

@scottx611x if you submit the job from your commandline to slurm it crashes, but if you run it locally it succeeds?

scottx611x commented 6 years ago

@bgruening Almost. The same command copy and pasted from the failed Galaxy job works outside of SLURM on the same worker node. I haven't tried submitting to SLURM from outside of Galaxy, but I could try that as well.

I had been using the following command with salmon being an alias to the salmon from the mulled conda env that galaxy created.

mkdir ./index && mkdir ./output && salmon index --transcripts /mnt/galaxy/files/001/dataset_1239.dat --kmerLen 31 --threads "${GALAXY_SLOTS:-4}" --index './index' --type 'quasi'  && ln -s /mnt/galaxy/files/001/dataset_1240.dat ./single.fastq && salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans  --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65    --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20    --biasSpeedSamp 1  --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65    --maxReadOcc 100   --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0  --vbPrior 0.001  --sigDigits 3
bgruening commented 6 years ago

Maybe SLURM is killing your job because of too less memory allocation and the error message is just really wired?

scottx611x commented 6 years ago

@bgruening So I've tried some runs today with higher memory configurations and can still reproduce the segfault. I'm going to continue on and try to write up a reproducer for @dpryan79 here.

salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem=25000

salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem=100000

salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem-per-cpu=100000

dpryan79 commented 6 years ago

I can't reproduce this using 0.11.2 on Galaxy (18.05, not that that should matter) with a slurm (17.02.9) cluster. I've tried using both 20 cores and 1 core (in case something weird is going on with the threading) and both run fine. I used our cluster default of 6GB per core, which is overkill for this job. My guess is that the same tbb version is getting used in each version of salmon you're trying and that it got corrupted at some point. Are you spinning up a new CloudMan instance for these runs or are you restarting a saved instance? If you're not starting a brand new instance then try that, then you can avoid using the same possibly corrupted tbb install.

scottx611x commented 6 years ago

@dpryan79 Thanks for trying to reproduce, I really appreciate this. We're currently bringing up CloudMan instances derived from shared cluster strings. I'll try to bring up a fresh CloudMan instance and try to see the same behavior that you are.

nsheff commented 4 years ago

Did this ever get tracked down? we are having a situation where salmon seems to segfault whenever using slurm (this time it's salmon index that segfaults, though). wondering if you figured out a solution.

scottx611x commented 4 years ago

@nsheff Sorry I was never able to dig into this further

izaakm commented 4 years ago

Also getting segmentation fault. Any progress on this? This is salmon v1.3.0, installed with conda or using the binary, running in slurm. I do not get a segmentation fault if I pass only a single file, but I do if I pass two files.

$  ./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libType U -t GRCh38_latest_rna.fa -a data/processed/bwa-mem/SRR10571655.sam data/processed/bwa-mem/SRR10571656.sam -o _tmp/ 
Version Info Exception: server did not respond before timeout
# salmon (alignment-based) v1.3.0
# [ program ] => salmon 
# [ command ] => quant 
# [ threads ] => { 32 }
# [ libType ] => { U }
# [ targets ] => { GRCh38_latest_rna.fa }
# [ alignments ] => { data/processed/bwa-mem/SRR10571655.sam data/processed/bwa-mem/SRR10571656.sam }
# [ output ] => { _tmp/ }
Logs will be written to _tmp/logs
[2020-10-12 16:13:21.969] [jointLog] [info] setting maxHashResizeThreads to 32
[2020-10-12 16:13:21.969] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
Library format { type:single end, relative orientation:none, strandedness:unstranded }
[2020-10-12 16:13:21.969] [jointLog] [info] numQuantThreads = 26
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "data/processed/bwa-mem/SRR10571655.sam", fasta = "GRCh38_latest_rna.fa" . . .done
[2020-10-12 16:13:26.979] [jointLog] [info] replaced 5 non-ACGT nucleotides with random nucleotides

processed 103000000 reads in current round[1]    1994 segmentation fault (core dumped)  ./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libTyp

Always at 103000000 reads.

rob-p commented 4 years ago

Hi @izaakm,

This segfault is unlikely related to the issue here, since that happened in "mapping mode" (salmon performing mapping itself), and yours is happening in alignment-based mode (you're feeding SAM files to salmon). Does it fail to occur when you provide either of the SAM files to salmon? That is, does it run to completion with both data/processed/bwa-mem/SRR10571655.sam and data/processed/bwa-mem/SRR10571656.sam individually? Also, what if you combine them via a pipe (i.e. something like):

./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libType U -t GRCh38_latest_rna.fa -a <(cat data/processed/bwa-mem/SRR10571655.sam <(samtools view data/processed/bwa-mem/SRR10571656.sam)) -o _tmp/ 

the double redirect is just to make sure the header isn't included in the second sam file. Also, is the reference that you are passing to the -t option identical to the one with which bwa-mem was run? If the problem persists, we might need the sam/bam files to track it down further, since I imagine it may be data-dependent.

--Rob

izaakm commented 4 years ago

It does run with each of the two files separately, but when I try the command with the double redirect I get a message like the one below for many/all[?] of the sequences in the reference and quant.sf is empty (except the header).

[2020-10-12 17:05:47.406] [jointLog] [warning] Transcript XM_024446103.1 appears in the reference but did not appear in the BAM
rob-p commented 4 years ago

That is interesting. The attempt in the double redirect was to include all alignment records from the second sam file simply concatenated to the first. Assuming the SAM files contain the same header, this should be OK (simply another way to treat them as a single input). However this warning suggests that there were references in the file passed to -t that did not have a corresponding entry in the SAM file. Yet, with the redirect, the first sam file should contain the full header. I don't have a clear understanding of why this would happen yet.