ave-dcd / dcd_mapping

AVE DCD Mapping
MIT License
5 stars 4 forks source link

Find reference transcript targeted by scoreset submission #19

Closed samriddhi99 closed 5 months ago

samriddhi99 commented 1 year ago
ahwagner commented 1 year ago

@samriddhi99 is an "hgnc accession value" an HGNC id, e.g. hgnc:1234? Our goal for this project should be to validate and map a submitted score set; does a method to retrieve HGNC ids help us with that goal? Use this GitHub issue to provide rationale for why we want to retrieve gene accessions.

ahwagner commented 1 year ago

After discussion, this issue is really focused on how we can get to our preferred, compatible transcript for protein-coding scoresets. This has 3 primary steps:

  1. Query UTA for all transcripts that have alignments overlapping our alignment
  2. From those transcripts, select for those that are compatible (there are no exons missing from the target sequence alignment to the genome)
  3. Select the favored representative transcript using our previously established process (i.e. MANE Select / Plus Clinical / Longest).

@jarbesfeld should be primary assist on this, @korikuzma can also assist where needed.