Inconsistent Results with RolDE Approach without Setting Seed

bellayqian commented 2 months ago

Dear Elo Lab, I am very interested in your RolDE approach and would like to apply it to my project. However, when I tried to work on your sample data, I found out that without setting a seed, the results are inconsistent after each run. So basically, if I ran the code below multiple times, each time I would yield a different result for head(RolDE.data1, 5). Could you please tell me why? I also tried to run it 10 times with the same input dataset and saw the overlap of significant proteins, and unfortunately, there were not many overlaps between those 10 results. Any suggestions? Thank you very much for your time and kind support!

Best, Bella

library(RolDE) data(data1) data("des_matrix1") data1.res<-RolDE(data=data1, des_matrix=des_matrix1, n_cores=3) RolDE.data1<-data1.res$RolDE_Results RolDE.data1<-RolDE.data1[order(as.numeric(RolDE.data1[,2])),] head(RolDE.data1, 5)

tsvali commented 2 months ago

Hi!

Thank you for your interest in RolDE. Indeed, there is some randomness associated with the bootstrapping procedures applied by RolDE. Thus, without setting a random seed, the results will be slightly different for different runs. Regarding data1 you have tried, it is a “null” dataset of generated random protein expression values; it has no true differential expression signal between the conditions. This is why the top proteins are rather arbitrary or random and without setting a random seed, will differ from run to run due to RolDEs bootstrapping. If you do the same with data3 instead, which is a semi-simulated proteomics dataset with spike-in (“ups”) proteins, the results should be more consistent from run to run even with different seeds, as in the following example:

library(RolDE) 
data(“data3”) 
data("des_matrix3") 

res_list <- list()
for(i in 1:4){
  set.seed(i)
  data3.res <- RolDE(data=data3, des_matrix=des_matrix3, n_cores=3) 
  RolDE.data3 <- data3.res$RolDE_Results 
  RolDE.data3 <- RolDE.data3[order(as.numeric(RolDE.data3[,2])),] 
  res_list[[i]] <- as.character(RolDE.data3[1:50,1])
}
length(intersect(res_list[[4]][1:10],intersect(res_list[[3]][1:10],intersect(res_list[[1]][1:10],res_list[[2]][1:10])))) #8

I hope this helps, Best, Tommi

bellayqian commented 2 weeks ago

Hi Tommi,

Thank you for your detailed explanation regarding RolDE's functionality. I appreciate your clarification on the randomness associated with bootstrapping and the nature of the datasets. Your insights on the differences between data1 and data3 are particularly helpful. I'll proceed with testing data3 as suggested.

Best, Bella

elolab / RolDE

Inconsistent Results with RolDE Approach without Setting Seed #3