Closed reece closed 7 months ago
Problem statement: Given a p. variant, return a list of c. variants that translate to that p. variant.
Most p. variants are consistent with a very large number of c. variants with varying complexity. For example, a p. variant at AA position 1 might be consistent with a SNV c. variant at position 1, 2, or 3, or a multi-nucleotide variant 1nt, 2nt, or 3nt change. It is also consistent with a very large set of indels that span that region. In the most diabolical cases, a c. variant might (in principle) be predicted to alter splicing to produce a specified variant.
Another issue is that users may want c. variants that are within a single exon (i.e., do not span exon-intron boundaries). This filtering might be better supported as post-processing step.
Compound variants (i.e., distinct in-phase variants) create yet another kind of complexity.
Finally, the combinatorial complexity of reverse translation for even small indels will grow quickly. Imagine an insert of SAT. Each AA might derive from A<= GC[ACGT], S <= UC[ACGT], T <= AC[ACGT], or 64 combinations.
So, in order to implement this issue, we need to clearly define the problem we're solving (and therfore which problem classes we're excluding). A clearer set of requirements may imply parameters to the revtrans process that constrain the solution set (e.g., max_sub_len or max_ins_len), or a desire to use degenerate NTs to reduce combinatorial complexity.
Is there anyway to get the corresponding genomic positions for a given P.? I understand your point above about the complexity of the C. equivalent but couldn't we at least capture the potential genomic positions affected or no?
I realize it could be ambiguous but it could be isolated to a specific range.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Originally reported by: Brian Craft (Bitbucket: briancraft, GitHub: Unknown)
Mapping protein to other coords would be useful.