Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

SAUtils conflicting entries in custom annotation #61

Closed heseber closed 2 years ago

heseber commented 2 years ago

I get error messages when I try to convert custom annotation files with SAUtils. They are of two different types, but both boil down to a missing possibility to have multiple entries for the same change.

First type:

Here, there are two different deletions: CTAC at position 48037502 and TATT at position 48037503 of chr2. If we rebase both to 48037502, this is CTATCT and CTATCT. So these are in fact the same change.

ERROR: Conflicting entries for items at chr2:48037503 for alleles TA >

CHROM POS REF ALT cosmicId sampleCount
2 48037502 CTA C COSV52276126 1
2 48037503 TAT T COSV52284111 1

You do obviously identify the identity of these two changes, so you could assemble them into an array of annotations for this change.

Second type:

Here I have multiple entries for the same variant.

ERROR: Conflicting entries for items at chr1:2160390 for alleles C > G

CHROM POS REF ALT cosmicId gene cancerType sampleCount
1 2160390 C G COSV66013939 SKI alveolar 1
1 2160390 C G COSV66013939 SKI embryonal 1
1 2160390 C G COSV66013939 SKI neoplasm 1

I would like to get something like this:

"variants": [
        {
          "vid": "1-2160390-C-G",
          "chromosome": "1",
          "begin": 2160390,
          "end": 2160390,
          "refAllele": "C",
          "altAllele": "G",
          "myDataSource":[
                {
                    "cosmicId":"COSV66013939",
                    "gene":"SKI",
                    "cancerType":"alveolar",
                    "sampleCount":1
                },
                {
                    "cosmicId":"COSV66013939",
                    "gene":"SKI",
                    "cancerType":"embryonal",
                    "sampleCount":1
                },
                {
                    "cosmicId":"COSV66013939",
                    "gene":"SKI",
                    "cancerType":"neoplasm",
                    "sampleCount":1
                },
            ]
        }
]

or, ideally (but this would require an option to specify multi-level hierarchical annotations in the custom annotation files, which seems not to be possible):

"variants": [
        {
          "vid": "1-2160390-C-G",
          "chromosome": "1",
          "begin": 2160390,
          "end": 2160390,
          "refAllele": "C",
          "altAllele": "G",
          "myDataSource":[
                {
                    "cosmicId":"COSV66013939",
                    "gene":"SKI",
                    "cancerTypesAndCounts":[
                        {
                            "cancerType":"alveolar",
                            "sampleCount":1
                        },
                        {
                            "cancerType":"embryonal",
                            "sampleCount":1
                        },
                        {
                            "cancerType":"neoplasm",
                            "sampleCount":1
                        }
                    ]
                 }
            ]
        }
]

What I am looking for

I am looking for a possibility to define a custom annotation file that will collect multiple entries for the same change into an array, similar to what you are already doing for the COSMIC annotation in your LocalApp for TSO500 assays:

            "cosmic": [
               {
                    "id": "COSM4143682",
                    "refAllele": "A",
                    "altAllele": "G",
                    "gene": "TNFRSF14_ENST00000355716",
                    "sampleCount": 2,
                    "cancerTypesAndCounts": [
                       {
                            "cancerType": "carcinoma",
                            "count": 1
                        },
                       {
                            "cancerType": "other",
                            "count": 1
                        }
                    ],
                    "cancerSitesAndCounts": [
                       {
                            "cancerSite": "pancreas",
                            "count": 1
                        },
                       {
                            "cancerSite": "thyroid",
                            "count": 1
                        }
                    ],
                    "isAlleleSpecific": true
                },
               {
                    "id": "COSM4999217",
                    "refAllele": "A",
                    "altAllele": "G",
                    "gene": "TNFRSF14",
                    "sampleCount": 1,
                    "cancerTypesAndCounts": [
                       {
                            "cancerType": "carcinoma",
                            "count": 1
                        }
                    ],
                    "cancerSitesAndCounts": [
                       {
                            "cancerSite": "pancreas",
                            "count": 1
                        }
                    ],
                    "isAlleleSpecific": true
                }
            ],

As far as I understand, it is not possible to have multi-level hierarchies in custom annotation files (e.g., having multiple cosmic identifiers entries per variant as first level, and having multiple cancer type entries per cosmic id as second level). This is something you do in your data source type specific options of SAUtils (as demonstrated by the example above), but it seems not to be possible to be done with custom annotations.

I am doing this with Cosmic here because we have a license for Cosmic and I wanted to create a custom annotation for this, however I fail to create something similar to the output of LocalApp due to the issues described above. But also for other annotations, we may have data sources with multiple entries for the same position and with a hierarchy. Is there any way to create custom annotations that would cover this? Even if adding an option for hierarchical annotations is a larger effort, it would be very helpful if it were at least possible to have multiple single-level annotations for the same position assembled as an array.

rajatshuvro commented 2 years ago

Hello @heseber , You have correctly identified the limitations of Custom Annotation. And at the time of inception, the requirements for CA was rather simple. However, there is a work-around for complex annotations in CA. The idea is to have a string field that contains the JSON string (escaped) you want. For your multiple entry case for COSMIC above, you can do something like the following.

<!DOCTYPE html>

CHROM POS REF ALT jsonField
1 2160390 C G [{\"id\":\"COSV66013939\", \"gene\":\"SKI\", \"cancerType\":\"alveolar\"}, {\"id\":\"COSV66013939\", \"gene\":\"SKI\", \"cancerType\":\"embryonal\"}]

I know the output will not be exactly as you want, but it can get close. Let me know if that works for you.

Best Rajat

heseber commented 2 years ago

Thanks, Rajat, for this workaround. For Cosmic, I am using my patched version (see my pull request), but it's still good to know for other annotation sources.