Open jmfa opened 3 years ago
Hi Joao, It looks like sciphi has excluded almost all mutations before running tree reconstruction. There only appear to be 369 mutations left after filtering.
I have tried running sciphi on low coverage data as well, and observed similar oddities. I hacked around in the source code a bit, and I have come to the conclusion that the site filters tend to filter out many or all sites when average coverage gets near 3x. I tried disabling the filters, but I could not get sciphi to run quickly after that. I can share my version of sciphi with disabled site filters if you are interested in trying it. Perhaps you will have more luck.
Also, you might try https://github.com/raphael-group/SBMClone instead. It performs surprisingly well in my hands on simulated data.
Thanks @winni2k! Will definitely take a look at SBMClone. :)
Hi, I’m currently running SCIPhI on some simulated sets but i’m getting some peculiar patterns and I’m not sure whether this is a bug of the tool or a misspecification of parameter settings or something else. So, the simulated data I’m using consists of mpileups with 40 cells, ~10k sites (all variable) and sequencing depth ~ 5X.
My idea is to lower the SCIPhI settings to its minimum thus letting it be really permissive and allow for most sites to be picked up for phylogenetic reconstruction. I tried the following command line (in which I tried to set the parameters controlling for depth to 0):
sciphi -o test --in sampleNames -u 0 --ncf 0 --mff $minfreq --md 0 --mmw 4 --mnp 1 --ms 0 --mc 0 --unc true --mf 0 -l 200000 --seed $RANDOM ${sim}.mpileup
However this is what I get:
Reading the config file: ... done!
Reading the mpileup file: num Samples: 41
total # mut: 0 currently used: 0
normal - freq: 0.135070376076846 tmp: 0.135070376076846 SD: 0.01 count: 0 trails: 0
normal - overDis: 100 tmp: 100 SD: 5 count: 0 trails: 0
normla - alpha: 13.5070376076846 beta: 86.4929623923154
mutation - overDis: 2 tmp: 2 SD: 0.1 count: 0 trails: 0
mutation - alpha: 0.819906165230872 beta: 1.27014075215369
drop: 0.9 SD: 0.01 count: 0 trails: 0
lambda: 0 SD: 0.01 count: 0 trails: 0
1
done!
numUniqMuts: 0
dataUsage<0>: 0.1 0.1
369 0
newDataSize: 37 36.9
The new best score is: -367258.331190865
num Samples: 41
total # mut: 369 currently used: 37
[…]
As you can see, the total number of mutations identified is much lower than 10,000. So my question is: Is this just a question of low power for detection (and therefore expected), or am I setting the parameters wrong?
Thank you very much in advance, J