aleighbrown / pwgs_snakemake

Snakemake pipeline for running PhyloWGS on NIH Biowulf Cluster
4 stars 4 forks source link

Facets vs TitanCNA #1

Closed underbais closed 5 years ago

underbais commented 5 years ago

Hi Anna,

Any particular reason you used Facets, not Titan/Battenberg for phylowgs?

Thanks Chingiz

aleighbrown commented 5 years ago

No particular reason, think you could easily use any of the either.

Battenburg I found difficult to run, and TitanCNA was giving copy number calls that were inconsistant on a particular gene of interest in our case.

underbais commented 5 years ago

Same issues here with those 2. Any attempts to 'fix' create_phylowgs_inputs.py script so that it doesn't throw important CNVs away?

aleighbrown commented 5 years ago

Essentially, PWGs is going to exclude this CNVs because of the way it models ssms. I think in scenarios where you have a lot of CNVs that overlap, or you're doing multiple time points and a CNV may go from 3 to 5 (which is a lot of what's happening in our case), PWG's internal assumptions don't work very well.

Our solution was kind to run pwgs for the same sample multiple times with the copy numbers artificially set to normal for each sample in one run and then combine the trees manually.

So, if we had three samples in time, first, second, third sample

  first second third
real 1,1 1,2 1,6
fake_one 1,1 1,1 1,6
fake_two 1,1 1,2 1,1

And then take the top trees for those runs and manually put them together by comparing where which mutations ended up, which groups of mutations were consistantly together, and where our copy numbers of interest fell with what SSMs.

My advice is to try to use Canopy or Spruce, both are less user friendly(which why we started with PWGs), but I think they're more appropiate to the situation where there are a lot of copy number changes in the same places. I would start with trying Canopy; I found it a little more...actually usable than Spruce.