Choosing a segmentation algorithm

Hello, I was wondering if I could maybe find some help here about how to proceed forward with my analysis.

I have run cnvkit on WGS of 5 tumors, following the recommendations for WGS, against a reference built with a healthy tissue sample. Down the line, I inted to compare the tumors that belong to two different conditions, and figure out phenotype-specific gene-level CNAs.

When it comes to segmenting, I am encountering the problem of not know which algorithm I should "choose" to keep analyzing from. I have tried using cbs, hmm and hmm_tumor, the latter being the one that from the description in the docs would suit my case better.

In the following graph I have plotted the distribution of log2 values of the segments using the three different algorithms, and that of the .cnr files I fed to the segment call.

These are very aneuploid tumors and that is expected, and from that comparison I think CBS would be the one that reflects the .cnr distribution the best, but I am actually wondering if that is desirable or it's just indicative of more noise than the others.

Here's the comparison of the number of segments that are detected by each method in each sample:

My questions are:

Is there a particular metric that I could look at in order to determine which segmentation algorithm is optimal??
Is a segment log2 distribution similar to the bin-level log2 distribution a good quality indication, or does it otherwise mean more noisy samples?
Is there any particularity that I should take into account within the analysis to reflect the high aneuploidy levels of my samples?

Thanks a lot for your help!!

etal / cnvkit

Choosing a segmentation algorithm #749