Closed olgabot closed 4 years ago
{'Translation is shorter than peptide k-mer size + 1': 1, 'Translation frame has stop codon(s)': 3, 'Coding': 3, 'Non-coding': 13, 'Low complexity nucleotide': 0, 'Read length was shorter than 3 * peptide k-mer size': 2, 'Low complexity peptide in dayhoff6 alphabet': 1}
This is what I got after fixing the error - here in this PR - https://github.com/czbiohub/kh-tools/pull/58/files
Based on the spreadsheet you shared with me for the ground truth I should have had 4 coding reads in the output but one of them has been classified as All translations shorter than peptide k-mer size + 1
`SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 | TRUE | TRUE | TRUE |
---|---|---|---|
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 | TRUE | FALSE | FALSE |
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 | TRUE | TRUE | TRUE |
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 | TRUE | FALSE | FALSE |
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 | TRUE | TRUE | TRUE |
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 | TRUE | FALSE | FALSE |
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 | TRUE | FALSE | All translations shorter than peptide k-mer size + 1 |
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 | TRUE | FALSE | FALSE |
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 | TRUE | TRUE | TRUE |
---|---|---|---|
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 | TRUE | FALSE | FALSE |
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 | TRUE | TRUE | TRUE |
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 | TRUE | FALSE | FALSE |
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 | TRUE | TRUE | TRUE |
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 | TRUE | FALSE | FALSE |
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 | TRUE | FALSE | All translations shorter than peptide k-mer size + 1 |
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 | TRUE | FALSE | FALSE |
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 | TRUE | TRUE | TRUE |
---|---|---|---|
SRR306838.10559374 Ibis_Run100924_C3PO:6:51:17601:17119/1 | TRUE | FALSE | FALSE |
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 | TRUE | TRUE | TRUE |
SRR306838.2740879 Ibis_Run100924_C3PO:6:13:11155:5248/1 | TRUE | FALSE | FALSE |
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 | TRUE | TRUE | TRUE |
SRR306838.4880582 Ibis_Run100924_C3PO:6:23:17413:5436/1 | TRUE | FALSE | FALSE |
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 | TRUE | FALSE | All translations shorter than peptide k-mer size + 1 |
SRR306838.21218773 Ibis_Run100924_C3PO:6:106:16921:6743/1 | TRUE | FALSE | FALSE` |
Others look right to me
fixed in PR - https://github.com/czbiohub/kh-tools/pull/58 closing
With the new CSV output https://github.com/czbiohub/kh-tools/pull/57 (yay!)
For this CSV from the output: https://github.com/czbiohub/kh-tools/blob/e360b0dc6699ee143a10891dca6202370b67a48e/tests/data/translate/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22__alphabet-dayhoff_ksize-12.csv
The final "classification" (maybe should be called something else since it's not true machine learning classification with test/train data? maybe
categorization
)Per-read calls
If any of the frames is coding, then that read is called coding, the remaining are non-coding, unless the reads too short in nucleotide or peptide space.
Maybe it would be worth adding this
translation_categories.csv
as an ouptut, too? In addition to the verbose one?For the
classification_value_counts
, that should sum to the total number of reads, and in this case that is 23.Here's what the json for
classification_value_counts
should look like: