Closed XinyeZhao closed 2 years ago
Hi @XinyeZhao
Thanks for giving {splatter} a go. I'm not quite clear what your question is. Can you please provide the code you are using? The splat model doesn't have an explicit idea of a "highly variable gene" so I'm not quite sure what you are looking at.
Sure, I used python with scanpy to process the data and here is my code. It' just the standard pipeline of data preprocessing. And if I use the same code for the real PBMC3k data, filtering the top 2000 HVG is enough to see clear cluster with leiden, but data generated from Splatter require about 10k HVG to show the clusters
adata = ad.AnnData(simul_data.values)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5, n_top_genes=2000)
adata = adata[:, adata.var.highly_variable]
sc.tl.pca(adata, svd_solver='arpack', n_comps=30)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=1)
print(adata.X.shape)
sc.pl.umap(adata, color=['leiden'])`
How clearly separated clusters are depends on the parameters passed to splatSimulate()
. The estimation process does not set the parameters to create clusters so they need to be supplied manually.
@XinyeZhao Is this ok now or do you have any follow up questions?
Hi Splatter team,
When I use the PBMC3k data as the input for Splatter, I found that it seems the model considers all the genes to be highly variable gene. Because if I use scanpy to preprocess the simulated data and select top 2000 HVG to run leiden and visualize with umap, the clusters are not well separated even if I set de.prob to be 1. However, I noticed that the more genes I use for leiden, the better the clusters are separated. I just want to make sure if it's because I used the splatter in a wrong way. The first figure is using 7000 HVG and the second figure is using 2000 HVG.
Thanks!