About the Problem of "patient_id_sample" in Calculating LTSR

kaqisekuzi commented 4 months ago

@nh3 @brianpenghe @RikLindeboom Hi, I saw "patient_id_sample" in the code below you. May I ask if "patient_id_sample" is the sample information (such as when I sequenced and measured 6 samples, 3 blanks, and 3 treatments)? Or does it refer to "sample ID+cell ID"? Is “cv@meta.data$pool_name” the cell ID of a single cell ？ Thank you very much. Looking forward to your reply.

cv@meta.data$patient_id_sample <- paste0(cv@meta.data$patient_id,";",cv@meta.data$pool_name) # Using our multiplexing approach, we have replicates for each library

Y = table(cv@meta.data$patient_id_sample,cv@meta.data$cell_annot_revision_short)

mymetadata <- cv@meta.data[!duplicated(cv@meta.data$patient_id_sample),]

RikLindeboom commented 4 months ago

Hi @kaqisekuzi,

All PBMC samples were split into replicates and each replicate was pooled with samples from other patients. patient_id_sample is the patient identifier and the pool identifier (sequencing library name) concatenated. This way we generate replicates to account for technical variation, and when fitting each GLMM, we used these replicates to strengthen our analyses.

Hope this helps. Best wishes, Rik Lindeboom

kaqisekuzi commented 4 months ago

Hi @RikLindeboom ,
Thank you very much for your answer. I have two more questions Here is my data information

> table(meta$orig.ident) SeuratProject 57181

> table(meta$sample_number) RA23H0014 RA23H0016 RA23H0017 RA23H0018 RA23H0020 RA23H0023 RA23H0024 RA23H0025 7549 2512 3334 4469 7401 8410 2552 5856 RA23H0027 RA23H0028 RA23H0029 4316 2284 8498

> table(meta$G_age) middle young 32439 17193

> table(meta$G_sex) female male 29432 27749

1，My 11 samples come from 11 individuals and there are no duplicate samples. My data situation is: 11 samples, each with only one library. After integrating the 11 samples, sample information was placed in the "sample_number" column, with only two sets of biological variables placed in ”G_sex“、 “G_age”， I want to calculate the multiple changes in the proportion of cell types under different age groups (G_age: middle, young). Do you think a Poisson generalized linear mixed model can be used?

2，If two biological variables can be used as Poisson's generalized linear mixed models,I would like to observe the multiple changes in cell type ratios under age and gender. Do you think this is feasible?

Looking forward to your reply.

kaqisekuzi commented 4 months ago

Hi @RikLindeboom Here is the code I used, There are a few small issues:

Dotplot color is not obvious, using the Dotplot function you wrote, already using zlim=c (log (1/2), Log (2) has adjusted the color depth, but it is still not very noticeable.
Can the Dotplot legend be drawn by modifying parameters? What parameters need to be modified

args <- commandArgs(T) source('./RL003_function_collection_GitHub.R') library(Seurat) library(ggplot2) library(lme4) library(showtext) showtext_auto() library(egg) library('numDeriv')

load data

cv <- readRDS("/rmRP_setDims_addsubset_celltype_addGroup_addAgeSize.rds") Y = table(cv@meta.data$sample_number,cv@meta.data$celltype_add_subset)

mymetadata <- cv@meta.data[!duplicated(cv@meta.data$sample_number),]

metadata <- mymetadata[,c("G_sex","G_age","sample_number")] Y <- Y[rownames(Y)%in%metadata$sample_number,]

number of samples / number of cell types

nsamples = nrow(Y) ncells = ncol(Y)

repeating the meta data table by the number of cell types

metadataExp=cbind(metadata[rep(match(rownames(Y),as.character(metadata$sample_number)),ncells),],Celltype=rep(colnames(Y),rep(nsamples,ncells)))

res.prop=glmer(I(c(Y))~ (1|Celltype) +(1|sample_number) +(1|G_sex) +(1|G_age)

+(1|sample_number:Celltype) +(1|G_sex:Celltype) +(1|G_age:Celltype) , family=poisson,data=metadataExp,control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

standard errors of standard deviations (squre root of the variance parameters)

devfun = update(res.prop, devFunOnly=T) pars = getME(res.prop, c("theta","fixef")) hess = hessian(devfun, unlist(pars)) sdse.prop = data.frame(sd=unlist(pars), se=sqrt(diag(solve(hess))))

posterior means and their standard deviations

res.prop.ranef = ranef(res.prop)

Forest plot

rownames(sdse.prop)[rownames(sdse.prop)=="theta.Celltype.(Intercept)"] <- "Residual" pdf("/test2_Forest_plot.pdf", height = 7, width = 10) par(mar=c(3,6,1,1),mgp=c(1.2,0.5,0)) Forest(sdse.prop[grep("(Celltype|Residual)",rownames(sdse.prop)),],xlim=c(0,2.5)) dev.off()

postmean = cbind( getCondVal(res.prop.ranef,"G_age:Celltype",ncells,celltype=colnames(Y))[[1]], NA, getCondVal(res.prop.ranef,"G_sex:Celltype",ncells,celltype=colnames(Y))[[1]] )

lfsr = cbind( getCondVal(res.prop.ranef,"G_age:Celltype",ncells,celltype=colnames(Y))[[2]], NA, getCondVal(res.prop.ranef,"G_sex:Celltype",ncells,celltype=colnames(Y))[[2]] )

Dotplot

postmean_oldAgeGroupsPlusSeverity <- postmean lfsr_oldAgeGroupsPlusSeverity <- lfsr

myClust <- hclust(dist(postmean_oldAgeGroupsPlusSeverity*(1-lfsr_oldAgeGroupsPlusSeverity)),method = "complete")$order postmean_oldAgeGroupsPlusSeverity <- postmean_oldAgeGroupsPlusSeverity[myClust,] lfsr_oldAgeGroupsPlusSeverity <- lfsr_oldAgeGroupsPlusSeverity[myClust,]

pdf("/test2_dotplot1_plot.pdf", height = 7, width = 10)

par(mar=c(13,12,8,10))

par(mar=c(8,15,0,10)) Dotplot(postmean_oldAgeGroupsPlusSeverity, SORT=c(F,F),zlim=c(log(1/2),log(2)),ltsr=1-lfsr_oldAgeGroupsPlusSeverity, cex=0.8,srt=90,cex.axis = .8) dev.off()

Teichlab / COVID-19paed