BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

Workflow 12.0 #112

Open kirilllzaitsev opened 2 years ago

kirilllzaitsev commented 2 years ago

To this modification of Control-FREEC contributed several people: Siyuan Luo, Fiona Muntwyler, Nathan Neike, Garance Jaques, Valentina Boeva, and myself. In the following wrap-up, I describe some changes that are important when using the new version of the software.

Cloning the repository

Please use the following command:

git clone --branch feature/new_pipeline --recurse-submodules https://github.com/kirilllzaitsev/FREEC.git

This will fetch only the git branch containing the v12.0 code.

Installation

Compared to v11.6, there are some additional C++ libraries that you will need to install to run this version:

NOTE. The project was tested with GCC versions 10.3.0, 11.2; on a Linux machine.

You can use the following command to get the libraries above at once:

sudo apt-get install libeigen3-dev libboost-serialization-dev libboost-filesystem-dev libboost-test-dev libboost-program-options-dev libboost-thread-dev libtbb-dev libnlopt-dev

If you encounter some problems with the version of the packages you installed, please use the following command and ensure that the versions you installed match the working ones provided above:

apt-cache policy <problematic_package_name>

Added CMakeLists.txt file is meant to help you with the compilation procedure. The command below compiles the new version of Control-FREEC, and the executable file becomes subsequently available by path "build/PPC":

mkdir build && cd build && cmake .. && cmake --build .

Description of new fields in the configuration file:

[bayesopt]

optimObjectiveCalls=int, number of calls to the function that fits a GMM
kernelNoise=float, noise of the kernel in Bayesian optimization (purity search)
dataSubsamplingRateInPuritySearch=int, take 1 sample out of dataSubsamplingRateInPuritySearch 
in purity search
dataSubsamplingRateInPloidyEvaluation=int, take 1 sample out of dataSubsamplingRateInPloidyEvaluation 
in ploidy search
doRescaleRatio=bool, rescale CNR values or not. "false" is a recommended choice for the case of exome data.

[gmm]

maxIter=int, number of GMM iterations, common for purity search and ploidy search

Example of a configuration containing the fields above:

[bayesopt]

optimObjectiveCalls=6
kernelNoise=1e-10
dataSubsamplingRateInPuritySearch=1000
dataSubsamplingRateInPloidyEvaluation=3
doRescaleRatio=true

[gmm]

maxIter=7

These fields are optional and will be substituted by default values if not provided.