cbg-ethz / SCIPhI

Other
25 stars 11 forks source link

How to set more permissive parameters? #24

Open jmfa opened 3 years ago

jmfa commented 3 years ago

Hi, I’m currently running SCIPhI on some simulated sets but i’m getting some peculiar patterns and I’m not sure whether this is a bug of the tool or a misspecification of parameter settings or something else. So, the simulated data I’m using consists of mpileups with 40 cells, ~10k sites (all variable) and sequencing depth ~ 5X.

My idea is to lower the SCIPhI settings to its minimum thus letting it be really permissive and allow for most sites to be picked up for phylogenetic reconstruction. I tried the following command line (in which I tried to set the parameters controlling for depth to 0):

sciphi -o test --in sampleNames -u 0 --ncf 0 --mff $minfreq --md 0 --mmw 4 --mnp 1 --ms 0 --mc 0 --unc true --mf 0 -l 200000 --seed $RANDOM ${sim}.mpileup

However this is what I get:

Reading the config file: ... done! Reading the mpileup file: num Samples: 41 total # mut: 0 currently used: 0 normal - freq: 0.135070376076846 tmp: 0.135070376076846 SD: 0.01 count: 0 trails: 0 normal - overDis: 100 tmp: 100 SD: 5 count: 0 trails: 0 normla - alpha: 13.5070376076846 beta: 86.4929623923154 mutation - overDis: 2 tmp: 2 SD: 0.1 count: 0 trails: 0 mutation - alpha: 0.819906165230872 beta: 1.27014075215369 drop: 0.9 SD: 0.01 count: 0 trails: 0 lambda: 0 SD: 0.01 count: 0 trails: 0 1 done! numUniqMuts: 0 dataUsage<0>: 0.1 0.1 369 0 newDataSize: 37 36.9 The new best score is: -367258.331190865 num Samples: 41 total # mut: 369 currently used: 37 […]

As you can see, the total number of mutations identified is much lower than 10,000. So my question is: Is this just a question of low power for detection (and therefore expected), or am I setting the parameters wrong?

Thank you very much in advance, J

winni2k commented 3 years ago

Hi Joao, It looks like sciphi has excluded almost all mutations before running tree reconstruction. There only appear to be 369 mutations left after filtering.

I have tried running sciphi on low coverage data as well, and observed similar oddities. I hacked around in the source code a bit, and I have come to the conclusion that the site filters tend to filter out many or all sites when average coverage gets near 3x. I tried disabling the filters, but I could not get sciphi to run quickly after that. I can share my version of sciphi with disabled site filters if you are interested in trying it. Perhaps you will have more luck.

winni2k commented 3 years ago

Also, you might try https://github.com/raphael-group/SBMClone instead. It performs surprisingly well in my hands on simulated data.

jmfa commented 3 years ago

Thanks @winni2k! Will definitely take a look at SBMClone. :)