PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
447 stars 219 forks source link

feature request: rainfall plot #3

Closed crazyhottommy closed 7 years ago

crazyhottommy commented 8 years ago

Hi,

I want to plot kataegis plot, and I know I can use a function from here http://genomicsclass.github.io/book/pages/bioc2_rainfall.html

but it would be very useful to include it in the maftools.

Thanks, Ming

PoisonAlien commented 8 years ago

Hi Ming,

Thanks for this :+1: . Actually I thought about it and wrote a function for the plot but I didnt include it in the package, since most of the data comes from WXS and plotting kataegis from WXS is, I think not so useful/meaningful.

But I guess it wont do any harm.. I will push a commit soon :)

crazyhottommy commented 8 years ago

Great and looking forward to the next version. Thanks, Ming

PoisonAlien commented 8 years ago

Hi again,

I have pushed a commit for rainfallPlot function. There is also a new function coOncoplot. Check out news file.

Thanks for the suggestion!

crazyhottommy commented 8 years ago

Very nice! and kudos to the update of the documentation. It will be nicer if you can implement the algorithm to detect the kataegis regions in Signatures of mutational processes in human cancer.

I am not sure how difficult it is for you to implement it. I am not an algorithmic person. Just in case you can add a function for it. Thanks very much!

In the method part:

Definition of kataegis Kataegis has been identified via a PCF-based method as 6 or more consecutive mutations with an average intermutation distance of less than or equal to 1,000 bp. Other salient features include a preponderance for C>T and C>G mutations, a predilection for a TpC mutation context, processivity, evidence of having arisen on the same parental allele (being in cis) on sequencing reads and additionally (but not necessarily) co-localization with large-scale genomic structural variation.

A piecewise-constant-fitting-based algorithm for the detection of kataegis. Foci of localized hypermutation, termed kataegis, were sought in 507 wholegenome sequenced cancers. High-quality variant calls that had been previously subjected to filtering for mutational signature analysis were investigated using an algorithm developed to identify foci of kataegis.For each sample, all mutations were ordered by chromosomal position and the intermutation distance, defined as the number of base pairs from each mutation to the next one, was calculated. Intermutation distances were then segmented using the piecewise constant fitting (PCF) method to find regions of constant intermutation distance. Parameters used for PCF were gamma= 25 and kmin=2 and were trained on the set of kataegis foci that had been manually identified, curated and validated using orthogonal sequencing platforms. Putative regions of kataegis were identified as those segments containing six or more consecutive mutations with an average intermutation distance of less than or equal to 1,000 bp.

ttriche commented 8 years ago

you might consult https://support.bioconductor.org/p/76827/#76830

--t

On Mon, May 2, 2016 at 10:09 AM, Ming Tang notifications@github.com wrote:

Very nice! and kudos to the update of the documentation. It will be nicer if you can implement the algorithm to detect the kataegis regions in Signatures of mutational processes in human cancer http://www.nature.com/nature/journal/v500/n7463/full/nature12477.html.

I am not sure how difficult it is for you to implement it. I am not an algorithmic person. Just in case you can add a function for it. Thanks very much!

In the method part:

Definition of kataegis Kataegis has been identified via a PCF-based method as 6 or more consecutive mutations with an average intermutation distance of less than or equal to 1,000 bp. Other salient features include a preponderance for C>T and C>G mutations, a predilection for a TpC mutation context, processivity, evidence of having arisen on the same parental allele (being in cis) on sequencing reads and additionally (but not necessarily) co-localization with large-scale genomic structural variation.

A piecewise-constant-fitting-based algorithm for the detection of kataegis. Foci of localized hypermutation, termed kataegis, were sought in 507 wholegenome sequenced cancers. High-quality variant calls that had been previously subjected to filtering for mutational signature analysis were investigated using an algorithm developed to identify foci of kataegis.For each sample, all mutations were ordered by chromosomal position and the intermutation distance, defined as the number of base pairs from each mutation to the next one, was calculated. Intermutation distances were then segmented using the piecewise constant fitting (PCF) method to find regions of constant intermutation distance. Parameters used for PCF were gamma= 25 and kmin=2 and were trained on the set of kataegis foci that had been manually identified, curated and validated using orthogonal sequencing platforms. Putative regions of kataegis were identified as those segments containing six or more consecutive mutations with an average intermutation distance of less than or equal to 1,000 bp.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/PoisonAlien/maftools/issues/3#issuecomment-216295307

PoisonAlien commented 8 years ago

Thank you both for the suggestions. There is a hidden function in maftools which performs segmentation on inter-event distances to detect genomic regions with hyper mutations. Since I did not test it fully, I didn't export it. But, both of the above methods sounds good and I will see if I can implement them.

crazyhottommy commented 8 years ago

I am coming back to check whether you have it implemented? Thanks!

Tommy

PoisonAlien commented 8 years ago

Hi,

Sorry, this has been in my todo list for long time but I haven't implemented it fully. I have a working script for this using CBS (DNAcopy) but I'm trying some other methods (http://bioinformatics.oxfordjournals.org/content/early/2010/11/17/bioinformatics.btq647.full.pdf+html). I will try to finish this soon.

PoisonAlien commented 8 years ago

Hi,

I have pushed a recent commit for rainfall plot. set argument detectChangePoints to TRUE and it should detect change points if it finds any. I would appreciate if you can try this and let me know.

Thanks.

raman91 commented 7 years ago

Hi, I used maftools for detecting the kataegis loci using the argument detectChangePoints = TRUE. If we want to know the genomic window of kataegis. Is there any way to detect it using maftools? Thank You.

PoisonAlien commented 7 years ago

Hi, For now there is no method implemented to detect regions. You may have to look at adjacent points and decide the region. Please be aware that this is a naive implementation, it is sensitive to any sort of changes in inter-event distances. You may have to look carefully if the detected change points are true events. See also #13

Daniel-Geller commented 7 years ago

Hi PoisonAlien, Thanks for such a wonderful tool suite. I am a little confused the result of rainfall result. Let's say if change points have been detected in a case, can we say this case shows kataegis? Daniel

PoisonAlien commented 7 years ago

Hi Daniel,

Glad you find this useful. Regarding changepoints, I would be careful to call them as kataegis as it will detect any genomic loci where inter-event distance changes. In my opinion this is not a robust way to detect kataegis, but should be close enough to narrow down. See #13 for more.

dodoflyy commented 4 years ago

Very nice! and kudos to the update of the documentation. It will be nicer if you can implement the algorithm to detect the kataegis regions in Signatures of mutational processes in human cancer.

I am not sure how difficult it is for you to implement it. I am not an algorithmic person. Just in case you can add a function for it. Thanks very much!

In the method part:

Definition of kataegis Kataegis has been identified via a PCF-based method as 6 or more consecutive mutations with an average intermutation distance of less than or equal to 1,000 bp. Other salient features include a preponderance for C>T and C>G mutations, a predilection for a TpC mutation context, processivity, evidence of having arisen on the same parental allele (being in cis) on sequencing reads and additionally (but not necessarily) co-localization with large-scale genomic structural variation. A piecewise-constant-fitting-based algorithm for the detection of kataegis. Foci of localized hypermutation, termed kataegis, were sought in 507 wholegenome sequenced cancers. High-quality variant calls that had been previously subjected to filtering for mutational signature analysis were investigated using an algorithm developed to identify foci of kataegis.For each sample, all mutations were ordered by chromosomal position and the intermutation distance, defined as the number of base pairs from each mutation to the next one, was calculated. Intermutation distances were then segmented using the piecewise constant fitting (PCF) method to find regions of constant intermutation distance. Parameters used for PCF were gamma= 25 and kmin=2 and were trained on the set of kataegis foci that had been manually identified, curated and validated using orthogonal sequencing platforms. Putative regions of kataegis were identified as those segments containing six or more consecutive mutations with an average intermutation distance of less than or equal to 1,000 bp.

Hello Tommy, do you know more papers with kataegis detection algorithm?