Open kaqisekuzi opened 4 months ago
Hi @kaqisekuzi,
All PBMC samples were split into replicates and each replicate was pooled with samples from other patients. patient_id_sample is the patient identifier and the pool identifier (sequencing library name) concatenated. This way we generate replicates to account for technical variation, and when fitting each GLMM, we used these replicates to strengthen our analyses.
Hope this helps. Best wishes, Rik Lindeboom
Hi @RikLindeboom ,
Thank you very much for your answer. I have two more questions
Here is my data information
> table(meta$orig.ident)
SeuratProject
57181
> table(meta$sample_number)
RA23H0014 RA23H0016 RA23H0017 RA23H0018 RA23H0020 RA23H0023 RA23H0024 RA23H0025
7549 2512 3334 4469 7401 8410 2552 5856
RA23H0027 RA23H0028 RA23H0029
4316 2284 8498
> table(meta$G_age)
middle young
32439 17193
> table(meta$G_sex)
female male
29432 27749
1,My 11 samples come from 11 individuals and there are no duplicate samples. My data situation is: 11 samples, each with only one library. After integrating the 11 samples, sample information was placed in the "sample_number" column, with only two sets of biological variables placed in ”G_sex“、 “G_age”, I want to calculate the multiple changes in the proportion of cell types under different age groups (G_age: middle, young). Do you think a Poisson generalized linear mixed model can be used?
2,If two biological variables can be used as Poisson's generalized linear mixed models,I would like to observe the multiple changes in cell type ratios under age and gender. Do you think this is feasible?
Looking forward to your reply.
Hi @RikLindeboom Here is the code I used, There are a few small issues:
args <- commandArgs(T) source('./RL003_function_collection_GitHub.R') library(Seurat) library(ggplot2) library(lme4) library(showtext) showtext_auto() library(egg) library('numDeriv')
cv <- readRDS("/rmRP_setDims_addsubset_celltype_addGroup_addAgeSize.rds") Y = table(cv@meta.data$sample_number,cv@meta.data$celltype_add_subset)
mymetadata <- cv@meta.data[!duplicated(cv@meta.data$sample_number),]
metadata <- mymetadata[,c("G_sex","G_age","sample_number")] Y <- Y[rownames(Y)%in%metadata$sample_number,]
nsamples = nrow(Y) ncells = ncol(Y)
metadataExp=cbind(metadata[rep(match(rownames(Y),as.character(metadata$sample_number)),ncells),],Celltype=rep(colnames(Y),rep(nsamples,ncells)))
res.prop=glmer(I(c(Y))~ (1|Celltype) +(1|sample_number) +(1|G_sex) +(1|G_age)
+(1|sample_number:Celltype) +(1|G_sex:Celltype) +(1|G_age:Celltype) , family=poisson,data=metadataExp,control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
devfun = update(res.prop, devFunOnly=T) pars = getME(res.prop, c("theta","fixef")) hess = hessian(devfun, unlist(pars)) sdse.prop = data.frame(sd=unlist(pars), se=sqrt(diag(solve(hess))))
res.prop.ranef = ranef(res.prop)
rownames(sdse.prop)[rownames(sdse.prop)=="theta.Celltype.(Intercept)"] <- "Residual" pdf("/test2_Forest_plot.pdf", height = 7, width = 10) par(mar=c(3,6,1,1),mgp=c(1.2,0.5,0)) Forest(sdse.prop[grep("(Celltype|Residual)",rownames(sdse.prop)),],xlim=c(0,2.5)) dev.off()
postmean = cbind( getCondVal(res.prop.ranef,"G_age:Celltype",ncells,celltype=colnames(Y))[[1]], NA, getCondVal(res.prop.ranef,"G_sex:Celltype",ncells,celltype=colnames(Y))[[1]] )
lfsr = cbind( getCondVal(res.prop.ranef,"G_age:Celltype",ncells,celltype=colnames(Y))[[2]], NA, getCondVal(res.prop.ranef,"G_sex:Celltype",ncells,celltype=colnames(Y))[[2]] )
postmean_oldAgeGroupsPlusSeverity <- postmean lfsr_oldAgeGroupsPlusSeverity <- lfsr
myClust <- hclust(dist(postmean_oldAgeGroupsPlusSeverity*(1-lfsr_oldAgeGroupsPlusSeverity)),method = "complete")$order postmean_oldAgeGroupsPlusSeverity <- postmean_oldAgeGroupsPlusSeverity[myClust,] lfsr_oldAgeGroupsPlusSeverity <- lfsr_oldAgeGroupsPlusSeverity[myClust,]
pdf("/test2_dotplot1_plot.pdf", height = 7, width = 10)
par(mar=c(8,15,0,10)) Dotplot(postmean_oldAgeGroupsPlusSeverity, SORT=c(F,F),zlim=c(log(1/2),log(2)),ltsr=1-lfsr_oldAgeGroupsPlusSeverity, cex=0.8,srt=90,cex.axis = .8) dev.off()
@nh3 @brianpenghe @RikLindeboom Hi, I saw "patient_id_sample" in the code below you. May I ask if "patient_id_sample" is the sample information (such as when I sequenced and measured 6 samples, 3 blanks, and 3 treatments)? Or does it refer to "sample ID+cell ID"? Is “cv@meta.data$pool_name” the cell ID of a single cell ? Thank you very much. Looking forward to your reply.
cv@meta.data$patient_id_sample <- paste0(cv@meta.data$patient_id,";",cv@meta.data$pool_name) # Using our multiplexing approach, we have replicates for each library
Y = table(cv@meta.data$patient_id_sample,cv@meta.data$cell_annot_revision_short)
mymetadata <- cv@meta.data[!duplicated(cv@meta.data$patient_id_sample),]