HenrikBengtsson / PSCBS

🔬 R package: Analysis of Parent-Specific DNA Copy Numbers
https://cran.r-project.org/package=PSCBS
7 stars 4 forks source link

Add support for weights to segmentByPairedPSCBS() #26

Open HenrikBengtsson opened 9 years ago

HenrikBengtsson commented 9 years ago

Add support for weights to segmentByPairedPSCBS() and segmentByNonPairedPSCBS(). This can be done in at least two different ways:

  1. One vector of locus-specific weights that are used in both the TCN and the DH segmentation steps (and in the calculation of the corresponding segment mean levels).
  2. Two vectors of locus-specific weights where one is used for the TCN segmentation and the other for the DH segmentation (and the calculations of the corresponding segment mean levels).
  3. One vector of locus-specific weights that are used in only the TCN segmentation step (and in the calculation of the corresponding segment mean levels).

One rationale for separate weight vectors is that one might want to use weights for the DH segmentation that are a function on, say, the confidence scores of the genotype calls.

UPDATE (2016-04-24): Added third option of only TCN weights.

HenrikBengtsson commented 9 years ago

As a start, I plan to use a single weight vector w (Approach 1 above). Only after this step, I'll consider adding support for an optional second weight vector (Approach 2).

lima1 commented 8 years ago

Hi Henrik,

I use CBS and PSCBS to segment whole exome sequencing data. For CBS, I use a pool of normals to find good weights, i.e. setting it proportional to the inverse of log-ratio standard deviations in the pool. B-allele frequencies in whole exome have much less bias than coverage and biases are not necessarily correlated. I think it would be useful if the weights could be restricted to one of the steps. Two different weights would be a bonus (not sure I need it).

Thanks for your great packages, Markus

HenrikBengtsson commented 8 years ago

Thanks @lima1 for the feedback. To support optional weights in either step, I think we basically need to implement Option 2 (weights specific to each of TCN and DH).

A half-way approach between implementing Option 1 (same weights for TCN and DH) and Option 2, would be a third option for TCN-only weights ignoring DH weights until Option 2 is implemented. That might be most straightforward step for supporting weighted PSCBS segmentation, especially since segmentByCBS() already supports weights. I've updated my top comment with Option 3.

lima1 commented 8 years ago

Option 3 would be perfect for me. Thanks again!

lima1 commented 5 years ago

Hi Henrik, I'm quite happy with the weighting implemented in my PR and tested it on many samples. It cleans up some of the noisy regions quite a bit.

If you have any concerns with the current patch, happy to work on it more to get it in the next PSCBS release.

Thanks again! Markus