czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

Update summary json with new csv #59

Closed olgabot closed 4 years ago

olgabot commented 4 years ago

With the new CSV output https://github.com/czbiohub/kh-tools/pull/57 (yay!)

For this CSV from the output: https://github.com/czbiohub/kh-tools/blob/e360b0dc6699ee143a10891dca6202370b67a48e/tests/data/translate/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22__alphabet-dayhoff_ksize-12.csv

The final "classification" (maybe should be called something else since it's not true machine learning classification with test/train data? maybe categorization)

Per-read calls

If any of the frames is coding, then that read is called coding, the remaining are non-coding, unless the reads too short in nucleotide or peptide space.

  translation_category n_coding_frames
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 Coding 1
SRR306838.6196593 Ibis_Run100924_C3PO:6:29:16733:12435/1 Non-coding  
SRR306838.20767303 Ibis_Run100924_C3PO:6:104:6864:5062/1 Non-coding  
SRR306838.12582274 Ibis_Run100924_C3PO:6:62:11779:17975/1 Non-coding  
SRR306838.13334230 Ibis_Run100924_C3PO:6:66:16579:20350/1 Non-coding  
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 Coding 1
SRR306838.6813354 Ibis_Run100924_C3PO:6:32:10591:13073/1 Non-coding  
SRR306838.23113368 Ibis_Run100924_C3PO:6:114:13840:18459/1 Non-coding  
SRR306838.10872941 Ibis_Run100924_C3PO:6:53:6164:10522/1 Non-coding  
SRR306838.6192120 Ibis_Run100924_C3PO:6:29:5833:11991/1 Non-coding  
SRR306838.21295280 Ibis_Run100924_C3PO:6:106:2590:13965/1 Non-coding  
SRR306838.21201208 Ibis_Run100924_C3PO:6:106:2763:5109/1 All translation frames have stop codons  
SRR306838.18327923 Ibis_Run100924_C3PO:6:92:9077:13885/1 Coding 1
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 Coding 1
SRR306838.21417895 Ibis_Run100924_C3PO:6:107:8793:5012/1 Coding 1
SRR306838.17165743 Ibis_Run100924_C3PO:6:86:18789:18450/1 All translation frames have stop codons  
SRR306838.21229494 Ibis_Run100924_C3PO:6:106:6163:7753/1 All translation frames have stop codons  
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 All translations shorter than peptide k-mer size + 1  
SRR306838.20124664 Ibis_Run100924_C3PO:6:101:4701:5309/1 Non-coding  
SRR306838.16841308 Ibis_Run100924_C3PO:6:85:6205:5805/1 Non-coding  
SRR306838.1531 Ibis_Run100924_C3PO:6:1:15718:1062/1 Read length was shorter than 3 * peptide k-mer size  
SRR306838.2318 Ibis_Run100924_C3PO:6:1:15779:1141/1 Read length was shorter than 3 * peptide k-mer size  
adversarial_low_complexity_peptide Low complexity peptide in dayhoff6 alphabet  

Maybe it would be worth adding this translation_categories.csv as an ouptut, too? In addition to the verbose one?

For the classification_value_counts, that should sum to the total number of reads, and in this case that is 23.

Here's what the json for classification_value_counts should look like:

        'classification_value_counts': {
            'All translations shorter than peptide k-mer size + 1': 1,
            'All translation frames have stop codons': 3,
            'Coding': 5,
            'Non-coding': 11,
            'Low complexity nucleotide': 0,
            'Read length was shorter than 3 * peptide k-mer size': 2,
            'Low complexity peptide in dayhoff6 alphabet': 1},
pranathivemuri commented 4 years ago

{'Translation is shorter than peptide k-mer size + 1': 1, 'Translation frame has stop codon(s)': 3, 'Coding': 3, 'Non-coding': 13, 'Low complexity nucleotide': 0, 'Read length was shorter than 3 * peptide k-mer size': 2, 'Low complexity peptide in dayhoff6 alphabet': 1}

This is what I got after fixing the error - here in this PR - https://github.com/czbiohub/kh-tools/pull/58/files

Based on the spreadsheet you shared with me for the ground truth I should have had 4 coding reads in the output but one of them has been classified as All translations shorter than peptide k-mer size + 1

`SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 TRUE TRUE TRUE
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 TRUE FALSE FALSE
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 TRUE TRUE TRUE
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 TRUE FALSE FALSE
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 TRUE TRUE TRUE
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 TRUE FALSE FALSE
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 TRUE FALSE All translations shorter than peptide k-mer size + 1
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 TRUE FALSE FALSE
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 TRUE TRUE TRUE
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 TRUE FALSE FALSE
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 TRUE TRUE TRUE
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 TRUE FALSE FALSE
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 TRUE TRUE TRUE
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 TRUE FALSE FALSE
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 TRUE FALSE All translations shorter than peptide k-mer size + 1
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 TRUE FALSE FALSE
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 TRUE TRUE TRUE
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 TRUE FALSE FALSE
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 TRUE TRUE TRUE
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 TRUE FALSE FALSE
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 TRUE TRUE TRUE
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 TRUE FALSE FALSE
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 TRUE FALSE All translations shorter than peptide k-mer size + 1
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 TRUE FALSE FALSE`

Others look right to me

pranathivemuri commented 4 years ago

fixed in PR - https://github.com/czbiohub/kh-tools/pull/58 closing