Nanostring-Biostats / GeoDiff

GeoDiff, an R package for count generating models for analyzing Geomx RNA data. Note that this version of the package is still under development, undergoing submission process to Bioconductor 3.14 release and still needs to complete NanoString internal verification process.
MIT License
7 stars 6 forks source link

Question re methods behind the function calls? #27

Closed swbioinf closed 2 years ago

swbioinf commented 2 years ago

Firstly thanks for releasing this package. I'm looking forward to trying out the mixed effect model methods to handle repeated sampling from individuals (which has been an ongoing struggle to get my head around with limma approaches). And something comparable with other packages!

From working through the vignette - I have a few questions about the methods behind the functions that I can't figure out from the docs. Please let me know if there's a resource somewhere I should check out.

1) In the aggreprobe function - what are the “cor” and “score” tests? How are these used to include/exclude probes, and how are probes for a target combined?

2) In BGScoreTest function - how is the ‘score’ is being calculated? Why is the suggested p-value threshold 1e-3?

3) When filtering ROIs ("... keep those which have a high enough signal in comparison to the background."), how is the thresholding actually happening in this line?

ROIs_high <- sampleNames(kidney)[which((quantile(fData(kidney)[["para"]][, 1],
                                                  probs = 0.90, na.rm = TRUE) -
                                          notes(kidney)[["threshold"]])*kidney$sizefact_fitNBth>2)

4) Are these the same filtering functions and DE methods used in the backend of the geomx gui?

Thanks!

NicoleEO commented 2 years ago

@karagorman can help answer this question.

karagorman commented 2 years ago
  1. This method is used to aggregate multiple probes per target. Some screening can be conducted to ensure the probes correlate well with each other and are in similar dynamic range. For this, the parameter "use" is available.

    • When use = "cor" (i.e. correlation), the method filters out outliers that have correlation below the "corcutoff" parameter which is defaulted to 0.85
    • When use = "score", it uses th score from score test to detect outliers and filter out the probes After that, the sum of multiple probes for that target is used for analysis
  2. The BG Score test fits a Poisson model to the count matrix with background size factors estimated from the Poisson background model on negative probes. The score is the test statistic resulting from the derivation of the gradient of likelihod function. A paper is on its way detailing how these formula were derived. This formula differs depending on the parameter "useprior". The suggested p-value 1e-3 is based on some datasets used for testing the method. You can change this as you see fit for your data.

  3. This line is filtering the ROIs to only keep ROIs where the Q90 of the signal parameter is greater than 2 counts from the background threshold*size factor. This is the filter out the ROIs where the signal is very close to the background. The "para" parameter is set by fitNBth.

  4. These methods are not yet implemented in GeoMx DA

swbioinf commented 2 years ago

Thanks @karagorman that is really helpful information. Looking forward to trying this out in anger :)