CSOgroup / cellcharter

A Python package for the identification, characterization and comparison of spatial clusters from spatial -omics data.
https://cellcharter.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
92 stars 3 forks source link

Failed running the ClusterAutoK.fit in Cosmx #30

Closed TengyuZz closed 5 months ago

TengyuZz commented 8 months ago

Report

Hi, when I running the tutorial for CosMx pipeline with my CosMx data, I stopped at the ClusterAutoK.fit step when I running the code: autok.fit(adata, use_rep='X_cellcharter')

The error message always be : TypeError: sparse array length is ambiguous; use getnnz() or shape[0].

How can I figure it out? Many thanks!

截屏2024-02-29 15 03 39

Version information

No response

marcovarrone commented 8 months ago

Hi @TengyuZz thank you very much for trying out CellCharter!

What I believe is happening is that the features in X_cellcharter are in sparse format. For now, the fitting of the Gaussian Mixture Model only supports the dense format, but I will try to fix it as soon as I can.

However, the fact that you have sparse features makes me suspect that you haven't done dimensionality reduction and I have never tested CellCharter on the full features without dimensionality reduction, but I don't suggest it. Can you tell me how many features you have in the anndata, for example running adata.obsm['X_cellcharter'].shape?

If you have a small number of features you can just run adata.obsm['X_cellcharter'] = adata.obsm['X_cellcharter'].toarray(), but be aware that if you have a lot of features, the dataset may occupy a lot of memory.

TengyuZz commented 8 months ago

Hi @marcovarrone , Thanks for your kind reply. I followed the Nanostring CosMx tutorials here: https://cellcharter.readthedocs.io/en/latest/notebooks/cosmx_human_nsclc.html

I have running the dimensionality reduction steps as below:

截屏2024-02-29 17 21 33

When I running adata.obsm['X_cellcharter'].shape, here is the output:

截屏2024-02-29 17 20 23

My CosMx data is 960-panel, is that would influenced? Many thanks for helping me figure this problem out. I am really interested in Cellcharter for its strong function!

marcovarrone commented 8 months ago

Is it possible that you forgot adata.obsm['X_scVI'] = model.get_latent_representation(adata).astype(np.float32)?

Remember also to run cc.gr.aggregate_neighbors(adata, n_layers=3, use_rep='X_scVI', out_key='X_cellcharter') before fitting the GMM.

One other thing, unless you have noticed some strong batch effects between fovs, I would actually run scVI without the batch_key parameter. Of course, if you see a lot of bias between field of views, keep that parameter!

TengyuZz commented 8 months ago

Thanks @marcovarrone, I have resolved the problem and running successfully. As well, really appreciate for your help and suggestion with scVI batch parameter setting.

Another small question about the ClusterAutoK.stability that my result looks weird as shown, can I ask that is it expected like that trend? Many thanks.

截屏2024-02-29 18 47 45
marcovarrone commented 8 months ago

@TengyuZz It can happen, but it depends a lot on the data. For example, it happened to me when I had only one cancer sample and so the biggest difference would be between the tumor niche and the rest. You can take a look at the peak at 7. I know it's much lower but 0.75+ is actually not bad, it's just that the stability at k=2 is very very high.

Let me know if you got nice results :)

TengyuZz commented 8 months ago

Hi @marcovarrone , Thanks for your suggestion, I use k=7 running the analysis, the results is very good, at least match very well with immunofluorescence. We will in-depth discuss about more biology observations. Cellcharter is really a good strategy for spatial clustering. Many thanks for your patience and useful suggestions! :)