YangLabHKUST / STitch3D

Construction of a 3D whole organism spatial atlas by joint modeling of multiple slices
https://stitch3d-tutorial.readthedocs.io/en/latest/index.html#
MIT License
52 stars 2 forks source link

How to perform "scanpy" analysis in model_6PCW in Example 2? #11

Closed cristalliao closed 1 year ago

cristalliao commented 1 year ago

Dear Professors, I am interested in performing some "scanpy" analysis on the results using the STitch3D model in Example 2(6PCW human heart dataset), which is "model.adata_st". This object is Anndata, so I want to perform some "scanpy" analysis. Do you have any suggestions for me to do this? Also, could I get some information on the different meanings of variables? AnnData object with n_obs × n_vars = 1480 × 3817 obs: 'nGene', 'nUMI', 'Sample', 'weeks', 'ChipBatch', 'ChipNr', 'Experiment_date', 'Experiment_procedure', 'Sequencing_date', 'Raw_reads', 'new_x', 'new_y', 'percent.mito', 'res.0.8', 'selected', 'array_row', 'array_col', 'slice', 'batch', 'library_size', 'n_genes' var: 'n_cells' uns: 'log1p' obsm: 'spatial', 'loc_use', 'spatial_aligned', 'count', 'graph', '3D_coor', 'latent' I am a little confused about each variable in 'obs', 'var', 'uns' and 'obsm'. Could you provide me a dictionary to know with these meanings? Thanks a lot!

gefeiwang commented 1 year ago

Hi Cristal,

Most information in the AnnData object comes from file "GSE147747_meta_table.tsv", which is provided by the original dataset. To perform the alignment of slices, we added the 2D location information (an N-by-2 matrix) as .obsm['loc_use']. The aligned 2D locations are in .obsm['spatial_aligned'], while 3D locations are in .obsm['3D_coor']. The 3D spatial location neighbourhood graph is .obsm['graph']. We also saved learned representations as .obsm['latent'], which can be treated as a dimension reduction result when using scanpy for analysis. Basically, you can still use scanpy to perform other standard analysis like preprocessing, visualization, finding marker genes, etc..

cristalliao commented 1 year ago

Dear Geifei, Thanks for your explanations. It is really helpful!! Also, I still have some questions regarding this result:

  1. What is the definition of "spatial" in .obsm? Is that the same with the 2D location information (an N-by-2 matrix) as .obsm['loc_use']?
  2. I want to know if 3D locations in .obsm['3D_coor'] is generated by training the STitch3D model?
  3. I want to know if I can use the 3D locations in .obsm['3D_coor'] to perform the clustering like K-means clustering since I want to check the clustering performance using this great STitch3D model's 3D spatial information compared to the clustering performance using 2D dimensions data like .obsm['loc_use'] and .obsm['spatial_aligned'].

Thanks a lot! Best regards, Cristal

gefeiwang commented 1 year ago

Hi Cristal,

  1. "spatial" in .obsm records 'HE_X' and 'HE_Y' coordinates on H&E images provided by the original data, while "loc_use" in this example is coordinates we used for alignment (we used integer values which worked better in ICP alignment).
  2. It is generated in the preprocessing alignment step before training the model.
  3. You can try clustering with these locations, but I think it may be problematic if we only use the locations to cluster spots, because gene expression information is lost.

Best, Gefei

cristalliao commented 1 year ago

Dear Geifei,

Thanks for your explanations. It is really helpful!! Also, I want to know where I can find the gene expression information. Which variable is gene expression data?

AnnData object with n_obs × n_vars = 1480 × 3817 obs: 'nGene', 'nUMI', 'Sample', 'weeks', 'ChipBatch', 'ChipNr', 'Experiment_date', 'Experiment_procedure', 'Sequencing_date', 'Raw_reads', 'new_x', 'new_y', 'percent.mito', 'res.0.8', 'selected', 'array_row', 'array_col', 'slice', 'batch', 'library_size', 'n_genes' var: 'n_cells' uns: 'log1p' obsm: 'spatial', 'loc_use', 'spatial_aligned', 'count', 'graph', '3D_coor', 'latent'

Does the gene expression data change after training the STitch3D model? Where is the gene expression data before the model training?

Moreover, I want to know whether the x, y, and z axis(3D information) changed due to the STitch3D model training.

Thanks a lot! Best regards, Cristal

gefeiwang commented 1 year ago

Hi Cristal,

You can get gene expression in AnnData objects using "adata_st.X". Check here for more information about the data structure in AnnData. In STitch3D, we saved normalized and log-tansformed gene expression data in "adata_st.X". You can get raw data from the original datasets. Also, gene expression data and location information are not changed after STitch3D model training.

Best, Gefei