gavinha / TitanCNA

Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH) in cancer
GNU General Public License v3.0
94 stars 36 forks source link

Parameters in TitanCNA snakemake pipeline #48

Closed ysbioinfo closed 5 years ago

ysbioinfo commented 5 years ago

Hi Gavin, Thanks for your previous suggestions and I can run TitanCNA using snakemake pipeline successfully now. I have some questions about the parameters in snakemake pipeline:

  1. First is about the parameter: numMaxClonalClusters. Does it mean the max number of clonal clusters to be considered by TitanCNA? In my understanding, there should be only one clonal cluster and all CNAs in that cluster have a cellular prevalence = 1. Or does it just mean the number of clusters to be considered?

  2. I'm not familiar with ichorCNA but I noticed that there are many parameters to be set for it, some of which seems similar to TitanCNA (e.g. ichorCNA_normal/ichorCNA_ploidy and TitanCNA_normalInit/TitanCNA_maxPloidy). I don't know how much influence will the result of ichorCNA have on downstream analysis by TitanCNA. Is the default setting appropriate for my data? Information for my data: WXS depth: 300X purity: 0.2-0.6 ploidy: 2-4, most samples are 2, about 15% are 3, less than 5% are 4. (The purity and ploidy are estimated by two other softwares: Sequenza, Facets, and their consistency is good, so I think the estimation should be credible.)

  3. I'm running TitanCNA on a cluster and want to use 20 cores. But our job submission system is different from yours. Our rule is something like: Run -node_type -num_of_node -num_of_cpus_per_node [command].
    So will it work if I set the TitanCNA_numCores in config.yaml to 20 and use the command below? Run -fat4way -1 -20 snakemake -s TitanCNA.snakefile --cores 20

  4. If I have a SNV with CP calculated by those frequently-used softwares (PyClone, PhyloWGS...) and a CNA with CP calculated by TitanCNA, could I simply compare the CPs to judge their relative order during tumor evolution. For example, CP(SNV) > CP(CNA) means the SNV occurs previous to the CNA? I know it maybe unreasonable because the CP estimated by different softwares might be uncomparable, but I don't know other way to achieve my goal. Do you have some suggestions?

Thank you very much!

Yang

gavinha commented 5 years ago

Hi @snoopy-448

  1. First is about the parameter: numMaxClonalClusters. Does it mean the max number of clonal clusters to be considered by TitanCNA? In my understanding, there should be only one clonal cluster and all CNAs in that cluster have a cellular prevalence = 1. Or does it just mean the number of clusters to be considered?

The numMaxClonalClusters specifies to the pipeline how many separate solutions to run. For example, if you set numMaxClonalClusters: 3, then three separate runs will be performed: 1) Run with only 1 clonal cluster 2) Run with up to 2 clonal clusters 3) Run with up to 3 clonal clusters Then selectSolutions.R will pick 1 out of the 3 solutions (after selecting the optimal ploidy).

  1. I'm not familiar with ichorCNA but I noticed that there are many parameters to be set for it, some of which seems similar to TitanCNA (e.g. ichorCNA_normal/ichorCNA_ploidy and TitanCNA_normalInit/TitanCNA_maxPloidy). I don't know how much influence will the result of ichorCNA have on downstream analysis by TitanCNA. Is the default setting appropriate for my data? Information for my data: WXS depth: 300X purity: 0.2-0.6 ploidy: 2-4, most samples are 2, about 15% are 3, less than 5% are 4. (The purity and ploidy are estimated by two other softwares: Sequenza, Facets, and their consistency is good, so I think the estimation should be credible.)

The settings are independent between ichorCNA and TitanCNA. In fact, only the intermediate normalization step of ichorCNA is used as input into TitanCNA.

In my other snakemake pipelines (e.g. https://github.com/gavinha/TitanCNA_10X_snakemake), I use ichorCNA's chrX results for male samples. I also perform a lot more post-processing of results. This 10X pipeline is something I'm actively working on now so as I find new analysis steps that could be beneficial for the standard TitanCNA analysis, I will eventually implement there here.

  1. I'm running TitanCNA on a cluster and want to use 20 cores. But our job submission system is different from yours. Our rule is something like: Run -node_type -num_of_node -num_of_cpus_per_node [command]. So will it work if I set the TitanCNA_numCores in config.yaml to 20 and use the command below? Run -fat4way -1 -20 snakemake -s TitanCNA.snakefile --cores 20

For WXS data, I would recommend just using 1 core per sample since it is usually very fast. For running jobs on your cluster, you might need to figure out how to do this via the snakemake documentation.

ysbioinfo commented 5 years ago

Thanks for your reply! Great help to me!