erhard-lab / grandR

R package for nucleotide conversion sequencing data analysis
Other
8 stars 2 forks source link

Lm and nlls are biased in non-steady state #13

Closed teresa-rummel closed 2 years ago

teresa-rummel commented 2 years ago

Lm and nlls are biased in non-steady state. Plot showing the relative deviation of inferred HLs from ground truth as cumulative distribution

HL_noss
florianerhard commented 2 years ago

After careful checking: Using the default normalization already does the trick: grafik

kinetics=function(t,f0,s,d) s/d*(1-exp(-t*d))+f0*exp(-t*d)
m=subset(m,Condition=='noss')
m=FitKinetics(m,name="nonorm",type="nlls",steady.state=FALSE)
sf=sapply(Coldata(m)$duration.4sU,function(t) sum(kinetics(t,s=GeneInfo(m)$true_s,d=GeneInfo(m)$true_d,f0=GeneInfo(m)$true_f0.noss)))
sf=sf/median(sf)
m=Normalize(m,sizeFactors = 1/sf)
m=FitKinetics(m,name="norm",type="nlls",steady.state=FALSE)
m=Normalize(m)
m=FitKinetics(m,name="deseq2norm",type="nlls",steady.state=FALSE)

df=GetAnalysisTable(m)

plot(ecdf(log2(log(2)/df$true_d)-log2(df$`nonorm.noss.Half-life`)))
lines(ecdf(log2(log(2)/df$true_d)-log2(df$`norm.noss.Half-life`)),col='blue')
lines(ecdf(log2(log(2)/df$true_d)-log2(df$`deseq2norm.noss.Half-life`)),col='red')
abline(v=0)
abline(h=0.5)

Please change!

teresa-rummel commented 2 years ago

Redid it with default Normalization. Now the curves are shifted more towards the right?

Noss_all_methods

suppressPackageStartupMessages({library(grandR,lib.loc='/home/rummel/R/x86_64-pc-linux-gnu-library/3.6')})
m = readRDS('grandR.simulations/Mock.default.all.methods.rds')
m = Normalize(m)
extra=function(s) c(`Synthesis.lower`=unname(s$conf.lower["Synthesis"]),`Synthesis.upper`=unname(s$conf.upper["Synthesis"]),`Half-life.lower`=unname(s$conf.lower["Half-life"]),`Half-life.upper`=unname(s$conf.upper["Half-life"]))
m = FitKinetics(m,name = "nlls",type = "nlls", steady.state = c(ss=TRUE,noss=FALSE), return.extra = extra)
print('done_nlls')
m = FitKinetics(m,name = "ntr",type = "ntr", return.extra = extra, exact.ci=FALSE)
print('done_ntr')
m = FitKinetics(m,name = "entr",type = "ntr", return.extra = extra, exact.ci=TRUE)
print('done_entr')
m = FitKinetics(m,name = "lm",type = "lm", return.extra = extra)
print('done_lm')
m = FitKineticsPulseR(m, name = "pulseR")
print('done_pulseR')
saveRDS(m, "Mock.default.all.methods.rds")
florianerhard commented 2 years ago

If you use my code, what do you get?

teresa-rummel commented 2 years ago

Then I get this. So should I use the second (blue) normalization method? 26 07 2022_13 51 18_REC

florianerhard commented 2 years ago

blue uses the truth, so this is not something anyone could do. I really wonder why yours is different from mine...

teresa-rummel commented 2 years ago

I used it on our simulated data instead of the SARS data you probably used, right? So maybe something there is off. The code for the simulation is in the grandR manuscript repository.

florianerhard commented 2 years ago

No, the blue is what you get when you know the truth. I propose to simulate spike-ins by normalizing like this:

spike=(abs(log2(GeneInfo(m)$true_f0/rowMeans(GetTable(sars,type="norm"))))<0.02)
m=Normalize(m,genes=spike)

You have to adapt $true_f0 to the correct name in your table (it should be the truth from the noss simulation, of course). You might also have to play around with the cutoff (0.02 here).