Tang-Lab-super / PROST

PROST: A quantitative pattern recognition framework for spatial transcriptomics.
MIT License
5 stars 1 forks source link

choose the right min_distance #9

Open rocketeer1998 opened 1 week ago

rocketeer1998 commented 1 week ago

Hi @Sicrve11 @YchLiang , thanks again for your awesome tool! Two questions to discuss in detail:

  1. I wonder how to fine-tune the min_distance parameter in
PROST.run_PNN(adata, 
        platform="osmFISH", 
        min_distance = 800,
        init="mclust",
        n_clusters = n_clusters,  # same as annotation
        tol = 5e-3,
        SEED=SEED,
        lr = 0.1,
        max_epochs = 100)

when analyzing the datasets generated from different platforms?

  1. I notice that you used PROST.feature_selection for NGS-based platforms (visium) and skipped this step for Imaging-based methods (osmFISH). Will it affect downstream clustering if I also use PROST.feature_selection to only keep a certain amount of genes for Imaging-based methods?
Sicrve11 commented 5 days ago

Thank you for your continued interest in our approach and for providing lots of useful feedback.

The following is my concern about two issues you mentioned:

<1> The parameter setting of the number of neighbors or distance in the PNN model, is an issue worth discussing. The distance, for example, is 800 for osmFISH data and this value is based on the average distance between 9 neighbors for each point in the osmFISH data. Therefore, if you don't know how much distance to set, you can determine it by calculating the average distance for the specified number of neighbors. For the setting of the number of neighbors, the default parameter is the number of first-order neighbors, for example, the six-neighborhood structure Visium is set to 7, while another matrix-type or image-type data is set to 9, including self-loop. If you want to take into account more information about the neighborhood, you can appropriately expand this value, and we have done experiments before to verify that the result of PNN is robust for different ranges of the number of neighbors or distance. ![parameter_test1](https://github.com/Tang-Lab-super/PROST/assets/120349754/66607055-e5fe-4512-9358-b70601fc8a2c) <2> Feature selection strategy in different data platforms. When the information is sufficiently rich, such as NGS-based data(Visium), we only want to input more relatively useful information that can directly represent the organizational structure. Therefore, we used the genes with spatial specificity computed by PI as input. When the information is not rich enough, such as osmFISH, we want to utilize all the useful information for clustering as much as possible. This is because filtering genes may have a relatively greater impact on clustering results. If you want to test how much certain genes affect clustering or have other needs, you can do so by reducing the number of input genes or entering a list of genes("selected_gene_name" in PROST.feature_selection). In your feedback, I realize some limitations in the code. I should have provided neighbor counts and distances for all platforms to create the neighborhood relationships, rather than differentiating by platforms, therefore, I updated `run_PNN` function, and modified 'platform' as 'adj_mode'. Lastly, thanks again for your feedback and welcome further discussion!