Oshlack / splatter

Simple simulation of single-cell RNA sequencing data
http://oshlacklab.com/splatter/
GNU General Public License v3.0
217 stars 57 forks source link

Splatter: which parameters can cause individual variability( expression levels of genes should change across samples) #127

Closed Fatima-Zare closed 2 years ago

Fatima-Zare commented 2 years ago

I’m using the Splatter to generate single cell simulated data. I need to have a variability in samples which means that expression levels of genes should change across samples. I have 100 samples, 20 genes, 5 cell types and my code to generate single cell data is :

vcf <- mockVCF(n.samples = 100)
gff <- mockGFF(n.genes = 20)
params.group <- newSplatPopParams(batchCells =100,#Number of cells in each batch.
                                  similarity.scale = 8,
                                  eqtl.group.specific = 0.6,
                                  de.prob = rep(0.8,5),#Probability that a gene is differentially expressed in a group. Can be a vector.
                                  de.facLoc = 0.5, #Location (meanlog) parameter for the differential expression factor log-normal distribution. Can be a vector.
                                  de.facScale = 0.5,#Scale(sdlog)parameterforthedifferentialexpressionfactorlog-normaldis- tribution. Can be a vector.
                                  group.prob = c(0.4,0.3,0.1,0.1,0.1))#Probability that a cell comes from a group

sim.sc.gr <- splatPopSimulate(vcf = vcf, gff = gff, params = params.group, sparsify = FALSE)

I have two question: 1-Is there any other way that I can generate 100 samples? 2-Also, I want that gene expression level of genes change between individual samples. for example, gene expression level of Gene1 in celltypeA for sample1 should be different from gene expression level of Gene1 in celltypeA for sample 2 and etc. Is there anyway I can have this property?

azodichr commented 2 years ago

Hi @Fatima-Zare

Q1: The splatPopSimulate function will simulate scRNA-seq data for every individual in the provided vcf. So by specifying mockVCF(n.samples = 100) and providing that output to splatPopSimulate, you will generate data for 100 samples. Note that the mockVCF function is quite basic, to generate variant data that has more realistic LD/population structure consider using something like HAPGEN2 or sim1000G. Alternatively you can provide splatPop with genotype data from real donors using data from public repositories (e.g., GTEx).

Q2: The code you provided above should be doing exactly this. You can confirm by inspecting the gene means that are simulated for each individual for each gene for each cell-group. For example:

metadata(sim.sc.gr)$Simulated_Means$Group1[1:5, 1:10] 
metadata(sim.sc.gr)$Simulated_Means$Group2[1:5, 1:10]

You can also see exactly what celltype specific DE effects are being added by inspecting the rowData:

rowData(sim.sc.gr)[1:5,grep(".GroupEffect", names(rowData(sim.sc.gr)))]

Thanks for using splatter and let us know if you have more questions.

Fatima-Zare commented 2 years ago

Thank you for your reply.

For example, If I want to increase the variance of the vector of

metadata(sim.sc.gr)$Simulated_Means$Group1[1, 1:100], or metadata(sim.sc.gr)$Simulated_Means$Group1[2, 1:100], or metadata(sim.sc.gr)$Simulated_Means$Group1[3, 1:100], and etc, how should I change the parameters?

azodichr commented 2 years ago

Currently the variance between individuals in your simulated population is quite low because you have set similarity.scale = 8. That parameter impacts the shape of the gamma distribution that the coefficient of variation (CV) for gene gene is sampled from, where a larger similarity.scale value results in smaller CVs and thus less variation between individuals. To increase variance try decreasing the similarity.scale.

Fatima-Zare commented 2 years ago

Thank you for your reply. By changing similarity.scale, Now, I have a bigger variance and higher variation between individuals, for example for the following vector:

metadata(sim.sc.gr)$Simulated_Means$Group1[1, 1:100] and also for all other rows of matrix X.

X=metadata(sim.sc.gr)$Simulated_Means$Group1.

lazappi commented 2 years ago

@Fatima-Zare was @azodichr able to answer your question? Just wanted to check before I close this issue.

Fatima-Zare commented 2 years ago

Dear Luke, Christina helped me get through my problem and you can close the issue. Thank you again for your help.

Best, Fatima

On Thu, Feb 3, 2022 at 4:44 AM Luke Zappia @.***> wrote:

Message sent from a system outside of UConn.

@Fatima-Zare https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFatima-Zare&data=04%7C01%7C%7C109f9a8587bc45f6a95908d9e6f9c296%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637794782657919552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=xzOpFKweFgF%2B2KyIqSV6ZqquRr5bClFyViQiDMwokWk%3D&reserved=0 was @azodichr https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fazodichr&data=04%7C01%7C%7C109f9a8587bc45f6a95908d9e6f9c296%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637794782657919552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oRykS38TVlM2oj1oFiX%2BWcVxW4748D5nS59RRze2toU%3D&reserved=0 able to answer your question? Just wanted to check before I close this issue.

— Reply to this email directly, view it on GitHub https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOshlack%2Fsplatter%2Fissues%2F127%23issuecomment-1028793159&data=04%7C01%7C%7C109f9a8587bc45f6a95908d9e6f9c296%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637794782657919552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zDnUNXTDmbPqsR0lIOBc61Fslc7vHX%2FFfYomHlbHMKY%3D&reserved=0, or unsubscribe https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAETJWGHCB2TURGMMX3PMM3LUZJE7PANCNFSM5H5MG3AA&data=04%7C01%7C%7C109f9a8587bc45f6a95908d9e6f9c296%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637794782657919552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=AJV5cmfQFNFhu%2FWyPdgbHvBUY%2BFPSh%2Fyn4vlMVOj%2BJQ%3D&reserved=0 . Triage notifications on the go with GitHub Mobile for iOS https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C109f9a8587bc45f6a95908d9e6f9c296%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637794782657919552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=U%2F4itwkhI2qwMhqjIt0VIgjcH1L8B%2Bdz2qYnoHMp1hE%3D&reserved=0 or Android https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C109f9a8587bc45f6a95908d9e6f9c296%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637794782657919552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tKIKkdAySsolh8E%2FKz3SCDphTIiAs%2FjsX8rKTIANlm0%3D&reserved=0.

You are receiving this because you were mentioned.Message ID: @.***>