etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
501 stars 162 forks source link

Mitochondrion genome analysis #631

Open katajar opened 3 years ago

katajar commented 3 years ago

Is it possible to use cnvkit to analyze mitochondrion genome? Is it necessary to drop chrM from reference.cnn?

tskir commented 3 years ago

The problem with mitochondria is relative abundance.

For autosomes, normal samples always contain two copies, and for allosomes, 0/1/2 depending on sex+allosome combination.

In contrast, the amount of mitochondrial particles present will vary semi-randomly from sample to sample. It also depends on the tissue type, the assay used and other factors. So there will be a huge baseline variance of total chrM coverage.

Because of this, CNVkit always excludes mitochondria from the analysis entirely, to avoid them messing with the normalisation algorithms: https://github.com/etal/cnvkit/blob/9dd1e7c83705d1e1de6e6e4ab9fdc6973bf4002f/cnvlib/antitarget.py#L115-L122

I suppose it should be technically possible to adjust for this variance by normalising and centering chrM coverage separately of all other chromosomes. However, it would take substantial work to add this functionality. If you or someone else would like to submit a pull request, I'll be happy to review it, but unfortunately I don't have the bandwidth to do this myself.

Out of curiosity, is there a research use case to study chrM copy number variance? Perhaps in cancers?

katajar commented 3 years ago

Thank you so much for the fast reply. Now I more understand the issue. I just wanted to see differences between mitochondrial sequences from various yeast strains such as big deletions, amplifications and numbers of copies. From me it was only loose question but indeed I found a few publications referring to mtDNA copy number variations in cancers.

tskir commented 3 years ago

@katajar I see, thank you for the context.

I've been thinking about this, and actually there is a way for you to analyse mitochondria while avoiding huge modifications to CNVkit. The approach could work like this:

  1. After aligning, filter the BAM files so that the only contain the mitochondrial sequences.
  2. Rename the mitochondrial chromosome name to ensure that it does not equal chrM or MT (otherwise CNVkit would filter it out).
  3. Proceed with the analysis as usual. CNVkit will treat the mitochondria as regular chromosomes, and the algorithms usually used to correct whole-sample coverage variance should quite adequately work to correct whole-sample plus mitochondrial variance.

This should be reasonably straightforward to implement. And in fact, if you decide to proceed with this analysis, I would very much appreciate your feedback. This can perhaps be used in the future to improve the way CNVkit handles mitochondrial and other irregular sequences in human samples.