The A85C SNP delivers the sequence CCCC, where C is the underlined nucleotide. The A152G SNP delivers the sequence GGGG, where G is the underlined nucleotide.
Interestingly, there are no examples of glycine at codon 51 (A152G) or proline at codon 29 (A85C). I have thought for a long time that an inferred amino acid change should be checked against all previously reported codons within the gene family under consideration. (It is surprising how little variation there is at each position, across gene families.) Could this easily be done as part of OGRDB? I would not rule out a previously unseen change, but it would raise a red flag. (Are there other checks that could routinely be run? What about flagging SNPs that involve RGYW/WRCY hotspot motifs? That is, where the difference is seen at the underlined G or C.)
Could OGRDB collect changes that we agree are errors, and run automatic checks? It seems likely that whatever gave rise to the A85C and A152G errors, they will happen again.
Altogether I think there are three potential checks mentioned here:
Warnings for zXXX, XzXX, XXzX, XXXz -> XXXX (where X and z are any base). I am not sure whether we should include all four, but I don’t see why not – particularly as we may not know the read direction.
Warnings for an amino acid at a location that has not been previously observed in that family.
Warnings for the identified change in one of the two SHM hotspot motifs.
from Andrew Collins:
The A85C SNP delivers the sequence CCCC, where C is the underlined nucleotide. The A152G SNP delivers the sequence GGGG, where G is the underlined nucleotide.
Interestingly, there are no examples of glycine at codon 51 (A152G) or proline at codon 29 (A85C). I have thought for a long time that an inferred amino acid change should be checked against all previously reported codons within the gene family under consideration. (It is surprising how little variation there is at each position, across gene families.) Could this easily be done as part of OGRDB? I would not rule out a previously unseen change, but it would raise a red flag. (Are there other checks that could routinely be run? What about flagging SNPs that involve RGYW/WRCY hotspot motifs? That is, where the difference is seen at the underlined G or C.)
Could OGRDB collect changes that we agree are errors, and run automatic checks? It seems likely that whatever gave rise to the A85C and A152G errors, they will happen again.