gamerino / NBSpliceSuppInformation

Code for generation of simulated RNA-seq datasets and comparison of software packages for differential splicing detection (Merino et al)
0 stars 0 forks source link

issues about the simulation code #1

Open mhjiang97 opened 3 years ago

mhjiang97 commented 3 years ago
## Generating replicates values
# NB parameters
meansC1<-iso_info$meanC1Sim
meansC2<-iso_info$meanC2Sim
f.loess.C1<- loess(varC1 ~ mediasC1, data.frame(mediasC1=meansC1, varC1=iso_info$varC1Sim), degree=2)
f.loess.C2<- loess(varC2 ~ mediasC2, data.frame(mediasC2=meansC2, varC2=iso_info$varC2Sim), degree=2)
# variance prediction
var.predict1 <- predict(f.loess.C1, data.frame(mediasC1=meansC1))
var.predict2 <- predict(f.loess.C2, data.frame(mediasC2=meansC2))
size<-meansC1^2/(var.predict1-meansC1)
prob<-size/(size+meansC1)

numberExperimentRealizations<-10
for(nsim in 1:numberExperimentRealizations){
setwd(paste("/path_to_sim", nsim))
prob[size<0]<-size[size<0]<-1
C1repeticiones10<-t(sapply(1:nrow(iso_info), function(iso){
    iso_cts_C1<-t(sapply(1:1, function(x){
      cond1<-rnbinom(n=4, size=size[iso], prob=prob[iso])
      return(c(cond1))

    }))

    cero_index_cts<-which(is.na(iso_cts_C1[,1]) | is.na(iso_cts_C1[,2]) |   
        is.na(iso_cts_C1[,3]) | is.na(iso_cts_C1[,4]))
    iso_cts_C1[cero_index_cts,]<-0
    return(colMeans(iso_cts_C1))
  }))
iso_cts_C1<-as.data.frame(C1repeticiones10)
rownames(iso_cts_C1)<-iso_info$transcript_id
names(iso_cts_C1)<-c("C1R1", "C1R2", "C1R3", "C1R4")

# Condition2
size<-meansC2^2/(var.predict2-meansC2)
prob<-size/(size+meansC2)
prob[size<0]<-size[size<0]<-1
C2repeticiones10<-t(sapply(1:nrow(iso_info), function(iso){
    iso_cts_C2<-t(sapply(1:1, function(x){
      cond2<-rnbinom(n=4, size=size[iso], prob=prob[iso])
      return(c(cond2))

    }))

    cero_index_cts<-which(is.na(iso_cts_C2[,1]) | is.na(iso_cts_C2[,2]) | 
        is.na(iso_cts_C2[,3]) | is.na(iso_cts_C2[,4]))
    iso_cts_C2[cero_index_cts,]<-0
    return(colMeans(iso_cts_C2))
  }))
iso_cts_C2<-as.data.frame(C2repeticiones10)
rownames(iso_cts_C2)<-iso_info$transcript_id
names(iso_cts_C2)<-c("C2R1", "C2R2", "C2R3", "C2R4")

...
}

when you started the second round of the simulation, size and probe used for two conditions are the exactly same. There is no way to simulate samples which belongs to two different conditions if "numberExperimentRealizations" value is greater than 1.

mhjiang97 commented 3 years ago

In your former benchmark paper, the code of this chunk was same. Is it your actual strategy? Is there any point I didn't get? Thank you very much!