connor-lab / aln2type

8 stars 3 forks source link

V-22OCT-01 not called correctly #12

Open ulfschaefer opened 1 year ago

ulfschaefer commented 1 year ago

monologue-underling variants get called as "alt-probable" although they should be confirmed. The reason seems to be that the MNP P13L gets called as a wild type, when it is actually in the sequence:

                    {
                        "amino-acid-change": "P13L",
                        "codon-change": "CCC-CTT",
                        "gene": "N",
                        "one-based-reference-position": 28310,
                        "predicted-effect": "non-synonymous",
                        "protein": "nucleocapsid phosphoprotein",
                        "protein-codon-position": 13,
                        "reference-base": "CCC",
                        "type": "MNP",
                        "variant-base": "CTT",
                        "status": "no-detect"
                    }

In my example both positions 28311 and 28312 are T. I suspect the problem is related to the "one-based-reference-position" pointing to a base that is ref in the sample.

I am attaching the example I used.

example_barcode05.muscle.aln.fasta.zip

I spoke to the author of the definitions and we agreed that MNPs are denoted inconsistently across the yaml files. There will be an update so that all MNPs will:

Sorry about the faff. Ulf

abeazer commented 1 year ago

Hello Ulf, apologies for the delay in our response and thank you for raising this issue. This looks to be related to an issue we've identified with how aln2type handles MNPs and are working release an update to aln2type that fixes this issue as well as make it compatible with the new style of definition.

ulfschaefer commented 1 year ago

Thanks abeazer, that all sounds good.

FYI, I have had a similar issue that V-23JAN-01 (XBB.1.5) are called as probable when they should be confirmed. It was because the variant for F486P is not called even though it's definitely in the sequence.

{                             "amino-acid-change": "F486P",                             "codon-change": "TTT-CCT",                             "gene": "S",                             "one-based-reference-position": 23018,                             "predicted-effect": "non-synonymous",                             "protein": "surface glycoprotein",                             "protein-codon-position": 486,                             "reference-base": "TTT",                             "type": "MNP",                             "variant-base": "CCN",                             "sample-call": "CCC"                         },

definitely CCT in the sample in the above case.

Thanks Ulf

abeazer commented 1 year ago

Hi Ulf, thanks again!

We've spotted this issue and we've found its due to aln2type currently being incompatible the newer definition style of using the codon for the variant-base. Updating aln2type to the new style is the other major fix we're working on.

In the meantime, adjusting the definition yaml to reference-base: TT and variant-base: CC will allow aln2type to correctly call the mutation.

Thanks! Andi