broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

Implement Circular Binary Segmentation algorithm #431

Closed akiezun closed 9 years ago

akiezun commented 9 years ago

We need to have a Circular Binary Segmentation algorithm implementation in Java. The algorithm is described here: Olshen, A. B., Venkatraman, E. S., Lucito, R., and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based dna copy number data. Biostatistics, 5:557–72.

and speed improvements described here: Venkatraman, E. S. and Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array cgh data. Bioinformatics, 23:657–63.

The R implementation is in the DNACopy package: http://bioconductor.wustl.edu/bioc/vignettes/DNAcopy/inst/doc/DNAcopy.pdf

The requirement is to take data from the R package and implement the algorithm to produce results equivalent to those from R (there may be small differences due to random number generation).

akiezun commented 9 years ago

@cwhelan and @LeeTL1220 please chip in regarding requirements.

akiezun commented 9 years ago

the speed requirement is to be within 5x of the R code.

akiezun commented 9 years ago

it's acceptable to call out to R as the first version of the algorithm. Use RScriptExecutor

vruano commented 9 years ago

Ported to hellbender-protected: https://github.com/broadinstitute/hellbender-protected/issues/10