genome-nexus / genome-nexus-annotation-pipeline

Library and tool for annotating MAF files using Genome Nexus Webserver API
MIT License
8 stars 27 forks source link

options to add functional impact columns into MAF #147

Closed jjgao closed 3 years ago

jjgao commented 3 years ago
ao508 commented 3 years ago

Do we know what fields are desired in the annotated MAF?

MA for example has a ton of information so I was wondering how many of these fields that are returned in the JSON should be added to an annotated MAF that the annotation pipeline generates.

To start we can just make sure that we are getting these in the JSON response and figure out what columns to add to the annotated MAF later.

ao508 commented 3 years ago

Mutation Assessor is already supported ✔️

ao508 commented 3 years ago

PR to server repo: https://github.com/genome-nexus/genome-nexus/pull/449 PR to Java api client repo: https://github.com/genome-nexus/genome-nexus-java-api-client/pull/8

PR to annotation pipeline [ REQUIRES ABOVE PRs TO BE MERGED FIRST ] : https://github.com/genome-nexus/genome-nexus-annotation-pipeline/pull/148 * will also require an additional update to get the commit hash from the master branch once the above PR is merged

ao508 commented 3 years ago

@jjgao Following up on this - do we know what Mutation Assessor fields(s) we need to include in the output annotated MAF?

This is what we get in the JSON response from Genome Nexus for mutation_assessor

  "mutation_assessor": {
    "annotation": {
      "codonStartPosition": "string",
      "cosmicCount": 0,
      "functionalImpact": "string",
      "functionalImpactScore": 0,
      "hgvs": "string",
      "hugoSymbol": "string",
      "input": "string",
      "mappingIssue": "string",
      "msaGaps": 0,
      "msaHeight": 0,
      "msaLink": "string",
      "pdbLink": "string",
      "referenceGenomeVariant": "string",
      "referenceGenomeVariantType": "string",
      "refseqId": "string",
      "refseqPosition": 0,
      "refseqResidue": "string",
      "snpCount": 0,
      "uniprotId": "string",
      "uniprotPosition": 0,
      "uniprotResidue": "string",
      "variant": "string",
      "variantConservationScore": 0,
      "variantSpecificityScore": 0
    },
    "license": "string"
  }
ritikakundra commented 3 years ago

@ao508 : For the MAF I need, we would like to see what is shown in the portal:

Screen Shot 2020-09-24 at 11 34 30 AM Screen Shot 2020-09-24 at 11 34 25 AM Screen Shot 2020-09-24 at 11 34 19 AM
ao508 commented 3 years ago

@ritikakundra Here is how I plan on naming the columns:

The reason for the different naming convention for the MA fields is to match the field names that the importer expects: https://github.com/cBioPortal/cbioportal/blob/master/core/src/main/java/org/mskcc/cbio/portal/scripts/ImportExtendedMutationData.java#L220

sheridancbio commented 3 years ago

There is follow up work on this card in order for it to be usable. A new genome nexus server needs to be built and deployed for the three genome nexus deployments (public, genie, annotation)

jjgao commented 3 years ago

@ao508 maybe change "MA:FImpact" to "MA_Prediction" and "MA:FIS" to "MA_Score" to be consistent?

ao508 commented 3 years ago

I used “MA:*” to be consistent with the importer code (see the MafRecord parser class in the cbioportal core)