YangLabHKUST / STitch3D

Construction of a 3D whole organism spatial atlas by joint modeling of multiple slices
https://stitch3d-tutorial.readthedocs.io/en/latest/index.html#
MIT License
52 stars 2 forks source link

questions of using STitch3D #18

Closed MingLi-929 closed 1 year ago

MingLi-929 commented 1 year ago

hi developers, Thanks for all of you to develop such a great method for 3D reconstruction. This method helps me a lot in my research. I come to some problem when i follow the tutorials(https://stitch3d-tutorial.readthedocs.io/en/latest/tutorials/Drosophila_embryo/STitch3D_Drosophila_embryo.html)

  1. It seems that you delete the 14th slide. Would you please tell me the reason that you drop the last slide? Is there any standard for me to follow when i deal with my own spatial data.

    slice_all = sorted(list(set(adata_st_raw.obs['slice_ID'].values)))[:-1]
  2. i use all of the data you mention in the tuorial(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE190149 and the table you provide in github issue) and i follow the code step by step. when i deal with the scRNA data, i have different umap from you picture. So my question is that the difference come from random or there is some update to the GEO data which make my umap different from u. 截屏2023-07-21 10 55 52

  3. The last question is do the annotation to the clusters matters. In your tutorials(https://stitch3d-tutorial.readthedocs.io/en/latest/tutorials/Drosophila_embryo/STitch3D_Drosophila_embryo.html), the finished scRNA data show that there is 16 clusters, which is

    sensory nervous system
    salivary gland
    muscle
    CNS
    tracheal system
    midgut
    plasmatocytes
    proventriculus
    hindgut
    yolk nuclei
    fat body
    epidermis
    foregut
    oenocyte
    amnioserosa
    ubiquitous

    but in the stereoseq h5ad data,there are 10 clusters 截屏2023-07-21 11 00 41

    ['salivary gland', 'epidermis', 'CNS', 'carcass', 'fat body', 'muscle', 'trachea', 'midgut', 'hemolymph', 'foregut']
    Categories (10, object): ['salivary gland', 'epidermis', 'CNS', 'carcass', ..., 'trachea', 'midgut', 'hemolymph', 'foregut']

    The number and the name is different, like 'trachea' in ST annotation and the 'tracheal system' in scRNA annotaion. There is carcass in ST data but there is no in scRNA data.

My question is that if the annotation mater, there should be no difference between scRNA and ST. If annotation does not matter, can i only cluster the scRNA and name the cluster by random name like "cell type1", "cell type 2"....

Would u please give some advice about these questions, sincerely thanks in advance!!!

gefeiwang commented 1 year ago

Hi,

Thank you for your interest in our package! For the questions,

  1. Yes, we did not use the last slice because its shape is quite different from other slices, see Fig. S2A in the Supplementary Information in the original study paper (Wang, Mingyue, et al. "High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae." Developmental Cell 57.10 (2022): 1271-1283.).
  2. We downloaded the data from GSE190147 and tested our code again, and got the same result with our original code. The umap plot you presented is quite strange, and I think there might be some numerical problems before computing neighbors. So I wonder if there is any problems after normalization steps or pca on your side? Do the data after these steps contain NaNs or Infs?
  3. The annotation does not matter, and they may not need to be the same with ST data annotation. You can name the clusters like "cell type 1", ..., "cell type N".

Best, Gefei

MingLi-929 commented 1 year ago

Hi Gefei, Thank u for your rapid reply! I am so grateful to you with your detail explanation. I am using your package in my research and it is such a great work that i wanna recommand to my teamates.

Following your advice, i wanna apologize that i still can not reproduce your result of umap following the tutorial step by step

So I wonder if there is any problems after normalization steps or pca on your side? Do the data after these steps contain NaNs or Infs?

image

image

after normalization steps or pca, there is no such abnormal values.

It should be admittd that all you code is the standard workflow of clustering scRNA, like normalization, pca, umap and neighbors. And the workflow help me to correctly cluster all my scRNA data except GSE190147.

I wanna make sure one thing. The data from GSE190147 is the Supplementary data from the bottom of the html. image

and follow the code in your tutorial step by step can get the ump like u? if there is any thing i need to do out of the turtorial?

I am very sorry to interrupt u with such a stupid question, but the umap result is an important key for me to validate the scRNA data. Since it should be a normal scRNA data as input can lead to a valid 3D reconstruction model.

Would u please send me an cloud link , which includes all of the data used in the tutorial and all of the python code you used to produce the umap. If the data and the code was the same, there must be some bug or anything strange with my computer.

And the last thing i wanna to ask for you help is that

  1. According to the tutorial, it is the adata_ref_umap rather than adata_ref is going to be normalized. But it is adata_ref rather than adata_ref_umap to be passed to Stitch3D. Can I get the understanding that, the only thing i need to pass to Stitch3D is the scRNA data, even it is not normalized ?
  2. More over, Is it necessary to do clustering for the scRNA data , which is passed to Stitch3D?

Your package is so important to me that i need to pay attention to every details i coverd. Again, sorry to interrupt u and sincere thanks to you inadvance!!!

gefeiwang commented 1 year ago

Hi,

Thank you for checking the data! It is strange if such a umap plot occurs when there are no NaNs in the data.

I wanna make sure one thing. The data from GSE190147 is the Supplementary data from the bottom of the html.

and follow the code in your tutorial step by step can get the ump like u? if there is any thing i need to do out of the turtorial?

I have also uploaded another jupyter notebook file in this link for checking the single-cell data, in which we downloaded the data again and saved in the "update" folder. But all the code and results are just the same with the tutorial.

However, as you mentioned,

Can I get the understanding that, the only thing i need to pass to Stitch3D is the scRNA data, even it is not normalized ?

Indeed the umap plot in our tutorial is just for visualization of clusters, which is not necessary for running STitch3D. It is also not necessary to do clustering for the scRNA data if we already have cell-type labels. Therefore, if the umap plot is still not normal, maybe you can run the next steps without showing umap.

Best, Gefei

MingLi-929 commented 1 year ago

Hi, geifei, Thanks for your patient explanation ~ After a get your explanation, i still can not reproduce the umap and i dig into this bug. Eventually, i found that is bug is related to the version of scanpy and umap.link.

i change my scanpy from 1.7.2 to 1.9.1, which helps me solve this bug. image

I can not make the judgement that this problem absolutely came from scanpy, but in my side scanpy(1.7.2) actually resulted in the bug. So maybe my bug will give u a hint about the dependency of STitch3D, which require scanpy==1.7.2. And i did not test the rest code with scanpy 1.9.1, which may lead to conflicts. So it is just a hint!

At last, thank you for developing such a great package and your patient explanation!

Regards,