Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

MergedCosmicReader needs an update for new Cosmic releases #62

Closed heseber closed 2 years ago

heseber commented 2 years ago

New Cosmic releases use a different column name for the Cosmic ID (GENOMIC_MUTATION_ID instead of Mutation ID).

The cancer type count and cancer site count are based on study id. IMHO, this does not make sense, because we want to have the number of tumors where a mutation is found, not the number of studies - a study can have many tumors of the same tumor type and tumor site.

A quick fix is provided here. As I write in the log message of that commit, refactoring would be needed to replace "study" with "tumour" in variable names, comments, tests, etc.

heseber commented 2 years ago

See my pull request which provides a solution.

rajatshuvro commented 2 years ago

Thanks for your comments and the PR. We are currently redoing COSMIC. That design is based on some customer feedback. If the new formatting doesn't satisfy you, we can have a discussion.

Thank you so much for using Nirvana. I am glad you find it useful.

Best Rajat

heseber commented 2 years ago

Thanks, Rajat, is the redesigned COSMIC already available in some of the branches, or should I just wait?

rajatshuvro commented 2 years ago

Unfortunately, it is not available as a branch in the public repo. But it is coming soon. Thank you for your patience.

heseber commented 2 years ago

Okay, thank you. When you provide counts (such as CancerTypes or CancerSites), I would suggest to base these counts on tumor ids. Sample ids might be an alternative, but if there are multiple samples from the same tumor, this tumor shouldn't be counted multiple times. Thanks a lot for making Nirvana publicly available!