Use the numerical generative process to calibrate the model

MangiolaLaboratory / sccomp

Testing differences in cell type proportions from single-cell data.

https://stemangiola.github.io/sccomp/

GNU General Public License v3.0

94 stars 7 forks source link

Use the numerical generative process to calibrate the model #14

Open CastielZhao opened 3 years ago

CastielZhao commented 3 years ago

Does the false positive rate we claim (e.g. 0.05) correspond to 5% of false positives given our no-association, no-outlier simulated data?

Calibration:

inference of associations. (read https://doi.org/10.1093/nargab/lqab005)
inference of outliers

stemangiola commented 3 years ago

Calibrate inference of associations

Generate 100 datasets with the same total counts per subject (M size vector, where M is the number of subjects), for each dataset
Number of subjects 30, number of categories 20
Design matrix would have an intercept column and a factor of interest between -1 and 1
Setup coefficient to have same intercept (for simplicity), and zero slope
Generate the data
Execute sccomp (visit homepage of this repository)
- FOR INSTALLATION DO: devtools::install_github("stemangiola/sccomp")
- library(sccomp)
- Follow the readme
Count how many categories were labelled as significantly changing (by default we are using the 95% credible interval. Which means that we expect 5% of calls to be false)

CastielZhao commented 3 years ago

"Setup coefficient to have same intercept (for simplicity), and zero slope" Are there any other constraints on coefficient? i.e. integer ? Range ? Also, I assume that "zero slope" means coeff=(beta0,beta0,...,beta0; beta1,beta1,...,beta1); that the first column repeats 20 times.

stemangiola commented 3 years ago

"Setup coefficient to have same intercept (for simplicity), and zero slope" Are there any other constraints on coefficient? i.e. integer ? Range ?

Execute the code at the homepage of this repository and you will see what coefficients you get for a real dataset. You can get the range from those (except the intercept that should be zero for this test)

stemangiola commented 3 years ago

About integer or not, it is exactly the same. When you do matrix multiplication between design and coefficient is the same.

CastielZhao commented 3 years ago

Hi Stefano,

I have successfully created 100 data frames from my function. To detect the change, do I need to use sccomp library? Or I shall find out a way to do that ?

stemangiola commented 3 years ago

Hi Stefano,

I have successfully created 100 data frames from my function. To detect the change, do I need to use sccomp library? Or I shall find out a way to do that ?

Yes, run sccomp on your data set. See example dataset from github README. Start from a few and try to draw descriptive statistics.

CastielZhao commented 3 years ago

which function in the sccomp is used for detecting variation ?

CastielZhao commented 3 years ago

As I noticed the fuction: res = counts_obj %>% sccomp_glm( ~ type, sample, cell_group, count, approximate_posterior_inference = FALSE ) When analyzing multiple data frames, do I need to merge the data frames, or specifying different data frame by "cell goup " above? Also, type=category, count=count, sample=subject in our dictionary, right?

stemangiola commented 3 years ago

if you analyse different studies no, you analyse them independently. I don't know what you mean by data frames. Data frame can be anything. Please be more precise.

Also, type=category, count=count, sample=subject in our dictionary, right?

yes

CastielZhao commented 3 years ago

if you analyse different studies no, you analyse them independently. I don't know what you mean by data frames. Data frame can be anything. Please be more precise.

Also, type=category, count=count, sample=subject in our dictionary, right?

yes

By data frames, I mean the output simulated data frames from my numeric generation process.

stemangiola commented 3 years ago

one data frame includes M categories and N subjects.

another data frame includes M categories and N subjects.

one subject does constitute a very small dataset that cannot be used for regression, size = 1