COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
779 stars 165 forks source link

Orphan recovery option in rare cases causes Salmon to quit abruptly without error #929

Open gringer opened 6 months ago

gringer commented 6 months ago

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?

Salmon (bulk mode)

Describe the bug

For one of our 41 samples, salmon fails (quits, without any substantial output) when using the orphan recovery option (where Salmon attempts to try harder to pair up read alignments when one of the reads in a read pair fails to map properly). Given that it's only related to the orphan recovery option, and only one sample out of 41, I don't expect it'll affect our results in any substantial way, but I'm reporting this bug just in case it exposes other software issues that are more concerning.

To Reproduce Steps and data to reproduce the behavior:

Working:

./salmon/bin/salmon quant -p 64 --index reference/salmon_index -l ISR -1 merged/1791-${id}_1P.fastq.gz -2 merged/1791-${id}_2P.fastq.gz --validateMappings --seqBias --gcBias --posBias --softclip --allowDovetail --numBootstraps 10 -o mapped/salmon_${id}

Working produced the following file structure:

salmon_03
├── aux_info
│   ├── ambig_info.tsv
│   ├── bootstrap
│   │   ├── bootstraps.gz
│   │   └── names.tsv.gz
│   ├── exp3_pos.gz
│   ├── exp3_seq.gz
│   ├── exp5_pos.gz
│   ├── exp5_seq.gz
│   ├── expected_bias.gz
│   ├── exp_gc.gz
│   ├── fld.gz
│   ├── meta_info.json
│   ├── obs3_pos.gz
│   ├── obs3_seq.gz
│   ├── obs5_pos.gz
│   ├── obs5_seq.gz
│   ├── observed_bias_3p.gz
│   ├── observed_bias.gz
│   └── obs_gc.gz
├── cmd_info.json
├── lib_format_counts.json
├── libParams
│   └── flenDist.txt
├── logs
│   └── salmon_quant.log
└── quant.sf

5 directories, 23 files

Not working:

./salmon/bin/salmon quant -p 64 --index reference/salmon_index -l ISR -1 merged/1791-${id}_1P.fastq.gz -2 merged/1791-${id}_2P.fastq.gz --validateMappings --seqBias --gcBias --posBias --softclip --allowDovetail  --recoverOrphans --numBootstraps 10 -o mapped/salmon_${id}

Not working produced the following file structure:

salmon_03_withRecover
├── aux_info
├── libParams
└── logs
    └── salmon_quant.log

4 directories, 1 file

The file mapped/salmon_03_withRecover/logs/salmon_quant.log has nothing inside it.

Expected behavior

Properly-mapped reads, as demonstrated by the following metadata:

{
    "salmon_version": "1.10.0",
    "samp_type": "bootstrap",
    "opt_type": "vb",
    "quant_errors": [],
    "num_libraries": 1,
    "library_types": [
        "ISR"
    ],
    "frag_dist_length": 1001,
    "frag_length_mean": 158.48833607498765,
    "frag_length_sd": 54.34014977759742,
    "seq_bias_correct": true,
    "gc_bias_correct": true,
    "num_bias_bins": 4096,
    "mapping_type": "mapping",
    "keep_duplicates": false,
    "num_valid_targets": 147493,
    "num_decoy_targets": 61,
    "num_eq_classes": 179681,
    "serialized_eq_classes": false,
    "eq_class_properties": [
        "range_factorized",
        "gzipped"
    ],
    "length_classes": [
        496,
        768,
        1403,
        2707,
        100404
    ],
    "index_seq_hash": "c0bf1b46db288bdf947208ef6410a0ced47fa770ab5284a1b231d958b283728b",
    "index_name_hash": "db38822bce0fbc9a64cfb0b230f58119448d1c82706f1c515f210cccaf4fdf7d",
    "index_seq_hash512": "d683c5132cae8695500566a25eb95c0349427afe1664ac571160337850aa269b634ad444936bd6d35205597c4962636c8fadbcf6406ca409a159b65e5f53c59e",
    "index_name_hash512": "e552bd7a70d98c20ff4cf07a83a5f25d2dafe4a78e3dff92348f3d566c9037ccde0de6d4040625ca065a7484dcb8d668c583822bf5138e1540f61685bc991290",
    "index_decoy_seq_hash": "39d3837ea001def952e79d70003dbba0199cc859b32f26350abfa271a6741167",
    "index_decoy_name_hash": "bd5cd185b9e3272a64108e64e2bc47bc0552046dba3ff53683edeafab750c9ab",
    "num_bootstraps": 10,
    "num_processed": 28233938,
    "num_mapped": 13878036,
    "num_decoy_fragments": 1377519,
    "num_dovetail_fragments": 563891,
    "num_fragments_filtered_vm": 1456279,
    "num_alignments_below_threshold_for_mapped_fragments_vm": 2129372,
    "percent_mapped": 49.153738313089728,
    "call": "quant",
    "start_time": "Fri May 03 11:31:29 2024",
    "end_time": "Fri May 03 11:33:32 2024"
}

Screenshots

Program output from a failed process (with the --recoverOrphans option):

Version Info: This is the most recent version of salmon.
### salmon (selective-alignment-based) v1.10.0
### [ program ] => salmon
### [ command ] => quant
### [ threads ] => { 64 }
### [ index ] => { reference/salmon_index }
### [ libType ] => { ISR }
### [ mates1 ] => { merged/XXXX-03_1P.fastq.gz }
### [ mates2 ] => { merged/XXXX-03_2P.fastq.gz }
### [ validateMappings ] => { }
### [ seqBias ] => { }
### [ gcBias ] => { }
### [ posBias ] => { }
### [ softclip ] => { }
### [ allowDovetail ] => { }
### [ recoverOrphans ] => { }
### [ numBootstraps ] => { 10 }
### [ output ] => { mapped/salmon_03 }
Logs will be written to mapped/salmon_03/logs
[2024-05-03 15:09:51.221] [jointLog] [info] setting maxHashResizeThreads to 64
[2024-05-03 15:09:51.221] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2024-05-03 15:09:51.221] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2024-05-03 15:09:51.221] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
[2024-05-03 15:09:51.221] [jointLog] [info] parsing read library format
[2024-05-03 15:09:51.221] [jointLog] [info] There is 1 library.
[2024-05-03 15:09:51.221] [jointLog] [info] Loading pufferfish index
[2024-05-03 15:09:51.221] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig table | Time = 6.1119 s
-----------------------------------------
size = 25107960
-----------------------------------------
| Loading contig offsets | Time = 29.509 ms
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 163.13 us
-----------------------------------------
-----------------------------------------
| Loading mphf table | Time = 358.06 ms
-----------------------------------------
size = 3025374818
Number of ones: 25107959
Number of ones per inventory item: 512
Inventory entries filled: 49039
-----------------------------------------
| Loading contig boundaries | Time = 3.1166 s
-----------------------------------------
size = 3025374818
-----------------------------------------
| Loading sequence | Time = 237.3 ms
-----------------------------------------
size = 2272136048
-----------------------------------------
| Loading positions | Time = 2.8327 s
-----------------------------------------
size = 2977516968
-----------------------------------------
| Loading reference sequence | Time = 228.26 ms
-----------------------------------------
-----------------------------------------
| Loading reference accumulative lengths | Time = 320.51 us
-----------------------------------------
[2024-05-03 15:10:04.136] [jointLog] [info] done
[2024-05-03 15:10:04.170] [jointLog] [info] Index contained 147554 targets

[2024-05-03 15:10:05.131] [jointLog] [info] Number of decoys : 61   
processed 21000000 fragmentsointLog] [info] First decoy index : 147456
hits: 25885546, hits per frag:  1.2683(base) [**no further output**]

Desktop (please complete the following information):

$ uname -a
Linux big-bird 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy