GTB-tbsequencing / mutation-catalogue-2023

MIT License
12 stars 1 forks source link

MCNVs can be listed as containing DNA variants that may not be associated with relevant resistance information #11

Closed HillJamie closed 1 month ago

HillJamie commented 2 months ago

I have noticed that in some cases, an MCNV in a protein coding region contains a DNA level change.

variant chromosome position reference_nucleotides alternative_nucleotides
ccsA_c.393T>G NC_000962.3 620281 CGTG AAGC
ccsA_p.Ala132Pro NC_000962.3 620281 CGTG AAGC
ccsA_p.Arg131Lys NC_000962.3 620281 CGTG AAGC

My understanding is that the DNA level change is reported, rather than an amino acid change, when the mutation is synonymous.

However, in this case, the consequence is not a synonymous mutation, and so the resistance found in other isolates containing ccsA_c.393T>G is not likely to apply here.

Would it make more sense to remove the line "ccsA_c.393T>G | NC_000962.3 | 620281 | CGTG | AAGC" from the genomic_coordinates spreadsheet?

To say the same thing in a different way, one might describe the variant ccsA_c.393T>G as ccsA_p.Arg131= (in HGVS nomenclature). Then it is clear that ccsA_p.Arg131Lys precludes matching ccsA_p.Arg131=.

As I interpret the instructions https://github.com/GTB-tbsequencing/mutation-catalogue-2023/blob/main/Final%20Result%20Files/Instruction%20of%20use%20for%20incorporation%20of%20the%20mutations%20catalogue%20version%202%20results%20into%20bioinformatic%20pipeline.pdf there is otherwise a risk of reporting the variant with a low grade (ccsA_p.Ala132Pro and ccsA_p.Arg131Lys are not present in the "Catalogue_master_file" sheet, and the instructions are ambiguous about what to report when there is no match).

Thank you again for the resource, and all your help. Jamie

sachalau commented 2 months ago

Hi Jamie,

Thank you very much for your report. This shouldn't have been happening. Actually the genomic coordinates files were corrected a few months ago after reports from other users for a very related issue (synonymous changes incorrectly associated with MCNVs).

I understand where the issue comes from now and I'll be working towards implementing a fix.

However I believe the risk of misreporting is low.

there` is otherwise a risk of reporting the variant with a low grade (ccsA_p.Ala132Pro and ccsA_p.Arg131Lys are not present in the "Catalogue_master_file" sheet, and the instructions are ambiguous about what to report when there is no match).

This is the reason we decided to list all variants associated to MCNV, even though they are not graded. If incorporation is made correctly, three things should be reported at the moment in association to (620281, CGTG, AAGC):

  1. ccsA_c.393T>G => class 4
  2. ccsA_p.Ala132Pro => ungraded
  3. ccsA_p.Ala131Lys => ungraded

Of course entry 1 is incorrect, but the final reader of the report should still be aware of 2 & 3 being present.

sachalau commented 2 months ago

Hi Jamie,

Just letting you know that as I was working on the fix for the issue your reported, I further observed another undesirable artefact on the genomic coordinates output. Similarly to the one you reported, I believe its impact is relatively minor but I'm trying to address both issues at the same time. Which is why the fix is taking longer than expect

sachalau commented 1 month ago

Hi Jamie, All files have been updated.

HillJamie commented 1 month ago

Thank you so much - and apologies for not replying to your earlier comment.