Open giovp opened 3 years ago
so, after having installed squidpy, you can compute spatial staitsics with
import squidpy as sq
sq.gr.moran(adata, n_jobs=4)
the results are saved in
adata.uns["moranI"].head(10)
and already sorted by I statistiscs.
For a nice blog post on Moran's I check this out
you can visualize results with standard sc.pl.spatial
function, e.g.
sc.pl.spatial(adata, color=["Nrgn", "Camk2n1", "Mobp", "cluster"])
a very first intereting analysis, that I'm genuinely interested in is: how many significant SVG (so filter by pval) found by Moran's I are also significant marker genes (after sc.tl.rank_genes_groups(adata, groupby="<your cluster>"
).
Then you can for instance rank clusters based on # overlapping genes between the two statiscsit. I would expect that clusters that are not randomly dispersed across tissue have higher overlap with SVG (becuase SVG tends to select non random effects).
Let me know!
Hi Giovanni,
after installing squidpy the number of clusters from last week have changed. Is that normal? I would have expect that the results stay the same. Do you know how I can make sure to get the same results?
Looking at the blog post, I should have got five values: the Moran's I Index, Expected Index, Variance, z-score, and p-value. But running adata.uns["moranI"].head(10), I get the following:
Can you explain the column names, please?
I may have some wrong understanding about some terminology. Could you explain what is meant by a highly expressed gene, a marker gene and a spatially variable gene?
Thank you for your help!
Best, Phuong
Hi,
after installing squidpy the number of clusters from last week have changed. Is that normal? I would have expect that the results stay the same. Do you know how I can make sure to get the same results?
mmh If random seeds were not set in respective function, this is not given. Althogh usually stay the same, it can happen
Looking at the blog post, I should have got five values: the Moran's I Index, Expected Index, Variance, z-score, and p-value. But running adata.uns["moranI"].head(10), I get the following:
good point, I should make it return all of them, but also I think the only real important ones are the I statistics, and the pvalue (both raw and adjusted with e//g bh fdr)
Looking at the blog post, I should have got five values: the Moran's I Index, Expected Index, Variance, z-score, and p-value. But running adata.uns["moranI"].head(10), I get the following:
I statistsics, pvalue raw, variance, pval adjusted. For I and var, a simulation based approach (permutation based ith 1k permutations) is performed.
I may have some wrong understanding about some terminology. Could you explain what is meant by a highly expressed gene, a marker gene and a spatially variable gene?
sc.tl.rank_genes_groups()
sq.graph.moran()
hth, let me know for more infos
conversion between anndata and scexperiment: https://theislab.github.io/zellkonverter/
you can then use spark xzhoulab/SPARK
Hi Giovanni,
I cannot install SPARK. I tried it on two different ways but no one seems to work.
install.packages('devtools') library(devtools) devtools::install_github('xzhoulab/SPARK')
and
install.packages("Rcpp") install.packages("remotes") remotes::install_github("xzhoulab/SPARK")
Both gave me the same Error.
I haven't found anything on the internet. Do you have any advise?
Thank you for your help!
Best, Phuong
mmh seems compilation issues. Maybe submit an issue in the repo? they coudl help not sure sorry.
otherwise you could try this, it'a quit an interesting idea, and there seems to be nice tutorials. But it's in python: https://github.com/ManchesterBioinference/GPcounts
Hi Giovanni,
I am trying to use GPcounts with Negative Binomial likelihood to determine SVG. Hence I had to calculate the scale factors, which can be determined by performing negative binomial regression in R. But there is one error I cannot fix.
Do you have any advise?
I have uploaded the Code in my branch. Thank you for your help!
Best, Phuong
Hence I had to calculate the scale factors, which can be determined by performing negative binomial regression in R. But there is one error I cannot fix.
I think you are on the right track , and nice you managed to try out GP counts!
for size factors, you can just use the total number of counts per cells. You should have that by simply doing scanpy.pp.calculate_qc_metrics()
and it should be saved under adata.obs["total_counts"]
.
Could you just use that? I see the point of estimating with the nb glm, but that one should be a good approximation. Alternatively, I'd have to get a glimpse of your counts
object in R before attempting an explanation. But nice try!
Thanks for the hint. I will try it.
This is a part of my count
object:
The last column which you can't see here is the total number of counts per cell.
I also uploaded a csv file with the entire dataframe in my branch. The first column is actually the row names.
Best, Phuong
Hi Giovanni,
Unfortunately, I still cannot compute the SVG. I come up with a new problem which I cannot fix.
Do you have any idea?
Thank you for your help!
Best, Phuong
eh no sorry, this is also something that is specific to that library. Probably opening an issue there could be of help sorry
Hi Giovanni,
the people from the GPcounts package helped me fixing the code. They said that the scales variable should have the same dimension as the gene expression counts matrix, which was not the case for their data. So I tried also tried it with 100 columns. Then it took me 2:05 hours to compute the first genes. I'm not sure if it will work because there was already an error. What do you think?
This is only the time for the first 20 genes but I have 3138 ones.
Do you know if it's possible to calculate them faster(does gpu work?)?
Also they just published some code to compute the scale factors with python. I also like to try this but that will also cost a lot of computational time.
Thank you for your help!
Best, Phuong
Ihad a look at the issue and in their first reply, they show a way to speed it up:
gene_name = []
sparse = True
nb_scaled = True # set the nb_scaled argument to True to pass the scale factors
gene_name = Y_run.index
likelihood = 'Negative_binomial'
gp_counts = Fit_GPcounts(X,Y_run.loc[gene_name], scale = scale, sparse = sparse,nb_scaled=nb_scaled,safe_mode=False)
gp_counts.ConditionalVariance_inducing_points( M = 20)
the key part is when you instatntiate the gp_counts
object
gp_counts = Fit_GPcounts(X,Y_run.loc[gene_name], scale = scale, sparse = sparse,nb_scaled=nb_scaled,safe_mode=False)
basically setting sparse=True
and then
gp_counts.ConditionalVariance_inducing_points( M = 20)
setting a fixed number of inducing points. Can you try with that? it should be faster. Also, re genes, you probably want to run it only on highvly variable
genes = adata[:,adata.var.highly_variable].var_names.values
so you ahve less genes to evaluate.
looking forward to see results!
Hi Giovanni,
I just uploaded the GPcounts notebook. It's very similar to the original. Because it is computationally expensive, I saved the scale factors in a csv file. But the file is too big to upload (even the zipped one). Do you know how to upload it anyways?
I have normalized the data twice (sc.pp.normalize_total
and the scale factors). Do you think it is a problem? (the scale factors were determined after the first normalization)
Thank you for your help!
Best, Phuong
sc.datasets_visium_sge()
sc.tl.rank_genes_group
Then we'll discuss next week about spatial statistics tools. In R you can try this: https://github.com/xzhoulab/SPARK
@ngocphuongtran