Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

Update SAUtils for Cosmic #63

Open heseber opened 2 years ago

heseber commented 2 years ago

Current COSMIC releases have a slightly modified format of the TSV files. Some column headers have changed (e.g., Mutation ID -> GENOMIC_MUTATION_ID, and a new Tier column was added.

Counts for CancerType and CancerSite were based on study identifiers, not on tumor identifiers. This does not really make sense, because we want to know the number of tumors per cancerType and cancerSite, not the number of studies (a study can have many different tumors of the same type and site). Furthermore, the study id column is often empty if a PubMed id is specified instead.

This pull request updates the SAUtils for Cosmic to make it work with current Cosmic releases and also changes the counts to refer to tumors instead of studies. This is only for short variants, not for structural variants and fusions.

Here is an example output (just the "cosmic" section, and after pretty-printing with jq, of course):

"cosmic": [
            {
              "id": "COSV55892885",
              "refAllele": "A",
              "altAllele": "T",
              "gene": "PIK3CA",
              "sampleCount": 5,
              "cancerTypesAndCounts": [
                {
                  "cancerType": "carcinoma",
                  "count": 5
                }
              ],
              "cancerSitesAndCounts": [
                {
                  "cancerSite": "ovary",
                  "count": 3
                },
                {
                  "cancerSite": "large intestine",
                  "count": 2
                }
              ],
              "tiersAndCounts": [
                {
                  "tier": "1",
                  "count": 5
                }
              ]
            }
]