BiomedicalMachineLearning / stLearn

A novel machine learning pipeline to analyse spatial transcriptomics data
Other
176 stars 23 forks source link

stLearn Usage #251

Closed rialc13 closed 8 months ago

rialc13 commented 9 months ago

Hi,

Thank you for developing this amazing tool! I was wondering, can trajectory analysis be performed on data that has been already normalized, clustered, & annotated? Asking this because stLearn has it's own normalization & clustering algorithm (stSME). So for trajectory analysis, do we only need to perform the normalization & clustering steps using the stSME algorithm? If this is not necessary, can we directly start performing the steps from "2. Spatial trajectory inference" (choosing root) as mentioned in your tutorial (https://stlearn.readthedocs.io/en/latest/tutorials/Pseudo-time-space-tutorial.html) on our previously normalized, clustered , & annotated dataset?

Thanks

duypham2108 commented 9 months ago

Yes, you can apply it to the previously normalized, clustered, and annotated dataset. However, there are some issues that will happen. I can give some suggestions here:

rialc13 commented 8 months ago

Hi @duypham2108 . Thank you for your suggestions. I am trying to perform trajectory analysis on the publicly available Visium mouse brain dataset. Normalization, clustering, & annotation was performed in R. I saved the Seurat object as a 'h5ad' object & read it in Python using adata = scanpy.read_h5ad("brain_integrated_dataset.h5ad"). This object contains the following arrays/cols -

AnnData object with n_obs × n_vars = 6049 × 3000 obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'slice', 'region', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.8', 'seurat_clusters' var: 'features', 'SCT_features' uns: 'neighbors' obsm: 'X_pca', 'X_umap' varm: 'PCs' layers: 'SCT' obsp: 'distances'

After this we ran the following codes & getting these errors. It would be great if you can help me in troubleshooting this.

adata.raw = adata
adata.uns["iroot"] = st.spatial.trajectory.set_root(adata,use_label="seurat_clusters",cluster=6,use_raw=True) # This is the code being used in the stlearn tutorial
image

adata.uns['iroot'] = np.flatnonzero(adata.obs[6] == 'root_cluster')[0] # This is the code you mentioned in issue #251

image
duypham2108 commented 8 months ago

It should be like this

adata.uns['iroot'] = np.flatnonzero(adata.obs['root_cluster']  == '6' )[0]

Also, you should run this code first to make sure the image pixels are in the .obs

adata = st.convert_scanpy(adata)
rialc13 commented 8 months ago

Ok. I will try this out. But before that, I realized that the spatial coordinates aren't available in my dataset. So I used the below code to add image & position information -

st.add.image(adata=adata, imgpath='brain_spatial/tissue_hires_image.png', library_id='mouse_brain', quality='hires', visium=True, scale=0.08250825, spot_diameter_fullres=177.4829519178534)
st.add.positions(adata=adata, position_filepath='brain_spatial/tissue_positions_list.csv', scale_filepath='brain_spatial/scalefactors_json.json', quality='high')

My output looks like this (Notice that the image pixels/spatial information is added to uns but not to obsm) - AnnData object with n_obs × n_vars = 2696 × 3000 obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'slice', 'region', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.8', 'seurat_clusters', 'imagerow', 'imagecol' var: 'features', 'SCT_features' uns: 'neighbors', 'spatial' obsm: 'X_pca', 'X_umap' varm: 'PCs' layers: 'SCT' obsp: 'distances'

After this I ran adata = st.convert_scanpy(adata) but I am getting the below error -

image
duypham2108 commented 8 months ago

You need to add the numpy matrix of your spatial to adata.obsm['spatial']

rialc13 commented 8 months ago

Will the spatial matrix be composed of 'imagerow', 'imagecol' values already present in adata.obs? If so, how do I directly add these values as a 'spatial' array in adata.obsm?

duypham2108 commented 8 months ago

It's just similar to the scanpy or other tools from scverse. You can put directly the imagerow and imagecol to it. We will not use them in our analysis but for other tool plots or analyses, it will be useful

rialc13 commented 8 months ago

I converted the 'imagerow' & 'imagecol' to int type, saved them as a df, converted the df to numpy matrix using df = df.to_numpy() & added to adata.obsm['spatial']. My anndata object contains the following - AnnData object with n_obs × n_vars = 2696 × 3000 obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'slice', 'region', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.8', 'seurat_clusters', 'imagerow', 'imagecol' var: 'features', 'SCT_features' uns: 'neighbors', 'spatial' obsm: 'X_pca', 'X_umap', 'spatial' varm: 'PCs' layers: 'SCT' obsp: 'distances'

bdata = st.convert_scanpy(bdata)
bdata.obsm['spatial']

array([[1286, 1463], [1472, 479], [ 544, 1368], ..., [ 895, 1166], [1451, 752], [1183, 574]])

bdata.obs["seurat_clusters"] = bdata.obs["seurat_clusters"].astype("category")
bdata.layers["raw_count"] = bdata.X
bdata.obs['seurat_clusters']

AAACAAGTATCTCCCA-1 10 AAACACCAATAACTGC-1 13 AAACAGAGCGACTCCT-1 6 AAACAGCTTTCAGAAG-1 4 AAACAGGGTCTATATT-1 4 .. TTGTGTTTCCCGAAAG-1 9 TTGTTCAGTGTGCTAC-1 2 TTGTTGTGTGTCAAGA-1 5 TTGTTTCACATCCAGG-1 7 TTGTTTCCATACAACT-1 7 Name: seurat_clusters, Length: 2696, dtype: category Categories (19, int64): [0, 1, 2, 3, ..., 15, 16, 18, 19]

Now when running st.cluster_plot or trying to set root, getting the following errors - st.pl.cluster_plot(bdata,use_label='seurat_clusters',image_alpha=1,size=7)

image image

bdata.uns['iroot'] = np.flatnonzero(bdata.obs['root_cluster'] == '6' )[0]

image

I also tried this for setting root - bdata.obs['iroot'] = np.flatnonzero(bdata.obs['seurat_clusters'] == '6' )[0]

image
duypham2108 commented 8 months ago

Can you replace this line?

# Change this bdata.obs["seurat_clusters"] = bdata.obs["seurat_clusters"].astype("category") to
import pandas as pd
bdata.obs["seurat_clusters"] = pd.Categorical(bdata.obs["seurat_clusters"].astype(str))

Also the .obsm['spatial'] does not need to be int matrix. Just make sure it's the original coordinates for your spots that can map to the image.

rialc13 commented 8 months ago

Thank you for the suggestion. This worked! Now getting a new error after setting root.

bdata.obs['iroot'] = np.flatnonzero(bdata.obs['seurat_clusters']  == '3' )[0]
st.spatial.trajectory.set_root(bdata,use_label="seurat_clusters",cluster=3,use_raw=True)
st.spatial.trajectory.pseudotime(bdata,eps=50,use_rep="X_pca",use_label="seurat_clusters")
image

My anndata object contains the following - AnnData object with n_obs × n_vars = 2696 × 3000 obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'slice', 'region', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.8', 'seurat_clusters', 'imagerow', 'imagecol', 'iroot', 'sub_cluster_labels' var: 'features', 'SCT_features' uns: 'neighbors', 'spatial', 'seurat_clusters_colors', 'seurat_clusters_index_dict', 'paga', 'seurat_clusters_sizes' obsm: 'X_pca', 'X_umap', 'spatial' varm: 'PCs' layers: 'SCT', 'raw_count' obsp: 'distances'

The error mentions a col/array called X_diffmap which my anndata object doesn't have.

duypham2108 commented 8 months ago

I am not really sure about it. You can try to set run_knn=True in pseudotime function. Or it can be an issue in pca. Can you check the .obsm['X_pca'] to make sure it's fine?

rialc13 commented 8 months ago

I found that when I ran the below commands before running st.spatial.trajectory.pseudotime(bdata,eps=50,use_rep="X_pca",use_label="seurat_clusters"), the error was resolved.

st.pp.tiling(bdata,out_path="tiling",crop_size = 40)
st.pp.extract_feature(bdata)
st.spatial.morphology.adjust(bdata,use_data="X_pca",radius=50,method="mean") # My dataset already contained .obsm['X_pca'] so I didn't run PCA again through stlearn & directly ran this command
st.pp.neighbors(bdata,n_neighbors=25,use_rep='X_pca_morphology',random_state=0)

Can you help me understand the importance of these 4 steps? What are they exactly doing?

duypham2108 commented 8 months ago

These steps are for adjusting PCA based on the morphology (tiling the image and extracting the image information by deep learning, you can check on the stSME or the paper). You can use these steps (will get smoother diffmap) or just run the pca and the neighbors (knn) function again.

BaluPai commented 2 weeks ago

Hi, Thanks for the stLearn tool. I am new to Python as well as Spatial. I have mostly been doing my analysis of a 10X visium data with Seurat and SPATA2. I recently started trying stLearn for trajectory analysis. I ran into same issue described here, especially since I wanted to do trajectory analysis on a subset of spots (only malignant cells). All previous analysis being done in Seurat, I have converted into into Anndata .h5ad and added attributes as suggested in: https://github.com/BiomedicalMachineLearning/stLearn/issues/116 . For this I used the original non-subset parent data generated using "stLearn.Read10X()" function and added attributes:

spatial = st.Read10X("Work/10X_Visium/740_Visium/outs") data.uns["spatial"]=spatial.uns["spatial"] data.obs[["imagecol","imagerow"]] = spatial.obs[["imagecol","imagerow"]] data.obsm["spatial"] =coordinates ####(generated this from Seurat object using getTissueCoordinates function and converted to numpy array) Now the data looks like this after stLearn processing and normalization and clustering:

data _AnnData object with n_obs × n_vars = 2105 × 18039 obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'percent_mito', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.8', 'seurat_clusters',............predicted.id...etc........ 'imagecol', 'imagerow', 'tile_path', 'louvain' var: 'features', 'n_cells', 'mean', 'std' uns: 'spatial', 'log1p', 'pca', 'neighbors', 'louvain', 'louvain_colors' obsm: 'X_umap', 'spatial', 'X_pca', 'X_tile_feature', 'X_morphology', 'X_pca_morphology' varm: 'PCs' layers: 'rawcount' obsp: 'distances', 'connectivities'

but when I try to plot, the plot is weird with all the image squished to error_fig

Any suggestions would be very helpful!

Thanks