Is your feature request related to a problem? Please describe.
We have AutoPM4BP3 after implementation of #74. However, we need to properly implement all the methods in this class and then test it.
Describe the solution you'd like
Implement the _in_repeat_region and _in_conserved_domain methods
Add unit tests
Add integration tests
Rewrite docstrings
Describe alternatives you've considered
N/A
Additional context
Some info for the PM4 and BP3
PM4 (protein length)
Original Definition
Protein length changes due to in-frame deletions/insertions in a non-repeat region or stop-loss variants.
-- Richards et al. (2015); Table 4
Preconditions / Precomputations
If PVS1 was triggered then this criterion is skipped to avoid double counting.
If the variant is not an in-frame indel and not a stop-loss variant then this criterion is skipped.
Implemented Criterion
If the variant is an in-frame indel
If the variant is inside a repeat masked region then it is skipped
If the variant is inside a repeat as annotated by UniProt then it is skipped
Otherwise, this criterion is triggered.
If the variant is a stop-loss variant then this criterion is triggered.
User Report
Any reasons for skipping in repeat regions.
The transcript identifier.
Literature
N/A
Caveats
Richards et al. (2015) state that the size of the indel and amount of change in amino acids should influence the classification.
We currently do not have this implemented.
BP3
BP3 (in-frame repetitive)
.. note::
- We do not have proper Uniprot data yet (domain / repeat)
- Similar to repeat masker.
- Probably same for phylop100way?
Original Definition
In-frame deletions/insertions in a repetitive region without a known function.
-- Richards et al. (2015); Table 4
Preconditions / Precomputations
If the criterion BA1 triggered then this criterion is skipped.
If the variant is on chrMT then this criterion is skipped.
Implemented Criterion
If the variant is in a known functional domain according to UniProt then this criterion is skipped.
If the variant is in a repeat region according to UniProt repeat annotation genome repeat masker then this criterion is skipped.
If the variant is in a region of low conservation (PhyloP100Way less than 3.58, same as PMID:30376034 <https://pubmed.ncbi.nlm.nih.gov/30376034/>__) then this criterion is skipped.
If all conditions above fail then this criterion is triggered.
User Report
The variant position and the reason for triggering or skipping.
Literature
McCormick et al. (2020) describe the ACMG criteria for chrMT variants.
Caveats
We currently use the conservation threshold from PMID:30376034 <https://pubmed.ncbi.nlm.nih.gov/30376034/>__ and are lacking our own calibration.
Different from PMID:30376034 <https://pubmed.ncbi.nlm.nih.gov/30376034/>__, we do not check whether there are known pathogenic variants in the region.
Intervar
PM4 and BP3 by Automated Scoring
Indels and stop losses can change the length of proteins and disrupt protein function. We annotated the repeat region by using the “rmsk” database from the UCSC Genome Browser. This database was created by the RepeatMasker program, which screens DNA sequences for interspersed repeats and low-complexity DNA sequences. When the variants are “non-frameshift insertion,” “non-frameshift deletion” in the non-repeat region, or stop-loss variants, PM4 will be applied. If the variants are “non-frameshift insertion” or “non-frameshift deletion” in the repeat region, BP3 will be applied.
Is your feature request related to a problem? Please describe. We have AutoPM4BP3 after implementation of #74. However, we need to properly implement all the methods in this class and then test it.
Describe the solution you'd like
_in_repeat_region
and_in_conserved_domain
methodsDescribe alternatives you've considered N/A
Additional context Some info for the PM4 and BP3
PM4 (protein length)
Original Definition
Preconditions / Precomputations
Implemented Criterion
User Report
Literature
N/A
Caveats
BP3
BP3 (in-frame repetitive)
.. note::
Original Definition
Preconditions / Precomputations
Implemented Criterion
PMID:30376034 <https://pubmed.ncbi.nlm.nih.gov/30376034/>
__) then this criterion is skipped.User Report
Literature
Caveats
PMID:30376034 <https://pubmed.ncbi.nlm.nih.gov/30376034/>
__ and are lacking our own calibration.PMID:30376034 <https://pubmed.ncbi.nlm.nih.gov/30376034/>
__, we do not check whether there are known pathogenic variants in the region.Intervar
PM4 and BP3 by Automated Scoring Indels and stop losses can change the length of proteins and disrupt protein function. We annotated the repeat region by using the “rmsk” database from the UCSC Genome Browser. This database was created by the RepeatMasker program, which screens DNA sequences for interspersed repeats and low-complexity DNA sequences. When the variants are “non-frameshift insertion,” “non-frameshift deletion” in the non-repeat region, or stop-loss variants, PM4 will be applied. If the variants are “non-frameshift insertion” or “non-frameshift deletion” in the repeat region, BP3 will be applied.