Closed jeremyButtler closed 3 months ago
Thanks for your reply on my previous comment. It looks like the pdf answered my question. Still it seems odd to me that these genomic coordinates were not included for grade 3, 4, and 5 variants, but also included a reference. Maybe in the next edition just have a tag saying no genome indices?
Dear Jeremy,
Thank you for your feedback. I'm sorry if I'm misunderstanding your various (independent) points, but I'll still to try to complement.
Still it seems odd to me that these genomic coordinates were not included for grade 3, 4, and 5 variants,
You should find that most grade 3-5 variants are actually included in the Genomic_coordinates. All variants that are in Catalogue_master_file are present in Genomic_coordinates, excepting all deletions and some unseen LoF. I think you are misinterpreting that sentence :
We do not provide genomic-variants for LoF graded-variant that are never classified as group 1 or 2 for any drug or that are not subject to an epistatis rule
The reason we made that choice is because there is no actionable reporting to be made associated with those variants (unseen LoFs that are not associated with 1-2 grading or an epistatis rule), so we did not want to add unnecessary entries to an already lengthy catalogue.
As to variants that are falling outside of gene boundaries, those are correct, for instance in the case of deletions that overlap gene sequences. Our annotation tool (SnpEff) predicted that these still have an effect on the protein if it's still expressed (a frameshift, or the loss of a stop codon, etc).
Thanks for letting me know this. It helps out in my understanding a lot.
Thanks for posting you catalog on git hub so we could let you know about bugs. That way if you ever decided to do a next edition you can think about bugs from the previous edition.
One thing I found is that a few of the variant have ids that are amino acid ids (gene_p.), but have sequences and positions outside of the gene reading frames. So, there might be a big deletion for a frame shift or an extra base added on to the end of a stop codon that is not part of the variant ids gene. There is a change there, but there is also some noise. I found these to be a bit hard to process.
Here is a table of the variants I know of that have this issue.
Again sorry about being the noisy person. I am using the catalog in my projects and started to notice things that break my programs. I have this issue dealt with, but figured I would make sure you knew that this was happening. That way you can be aware of it when you build the next catalog, if you are planning on it.
Thanks again for this resource.