drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
258 stars 98 forks source link

seemingly random clustering results #278

Open adescoeudres opened 2 years ago

adescoeudres commented 2 years ago

Hi! I have been running Giotto on some Visium data, using a script very similar to your mouse brain vignette. However, for each rerun of the method, the resulting spatial patterns from the HMRF (step 10) look very different, i.e. for many of the cells, the cluster assignment changes with each run. Is this expected behaviour? I think I have traced the origin of this randomness back to the silhouetteRankTest(), does this seem possible to you? Thanks in advance for your response!

gcyuan commented 2 years ago

Hi Thank you for using Giotto. Like all clustering methods, the results will depend on selected gene features as well as the clustering resolution.

To select genes, you would need to run the following steps first

  1. identify spatially variable genes,
  2. identify the spatial coexpression modules,
  3. Select top genes from each modules

Once these genes are selected, you then run the HMRF on the expression pattern of selected genes. The clustering resolution is adjusted by two parameters: k (the number of clusters) and beta (the smoothness of spatial pattern). Unfortunately we do not have a reliable algorithm to provide a guide for how to adjust these parameters. So our recommendation is to try different parameters and then pick the solution that is most biologically meaningful.

Hope this answers your question

Best GC Yuan

On Jul 7, 2022, at 11:12 AM, adescoeudres @.***> wrote:

 Hi! I have been running Giotto on some Visium data, using a script very similar to your mouse brain vignette. However, for each rerun of the method, the resulting spatial patterns from the HMRF (step 10) look very different, i.e. for many of the cells, the cluster assignment changes with each run. Is this expected behaviour? I think I have traced the origin of this randomness back to the silhouetteRankTest(), does this seem possible to you? Thanks in advance for your response!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

adescoeudres commented 2 years ago

Hi, thanks for the quick reply. I am performing all the steps you mention, but maybe I did not make my issue clear enough:

Keeping k and beta exactly the same, and running the same script on the same data, I get different clustering results when I repeat the analysis. I am evaluating this by ARI in addition to visually, and the ARI variation is not negligible (up to 0.15).

Do you have an intuition about what might be the issue, or whether this kind of change is expected?

I hope my question is clearer now, and thank you again for your time.

andrewrech commented 2 years ago

Hi there,

I use Giotto and other clustering methods in R and have experience with this issue. My two cents are that you might need to set.seed(1234). If you are running using a parallel backend - depending on which backend - you might need to address this in every R instantiation.

If you have already done this, I’ve found that some clustering methods do not respect this value, but I am not sure if Giotto is using such a method.

Another potential gotcha if the above doesn't uncover the answer is that data.table internally updates objects by reference, so you could be losing state at some point without realizing it.

Andrew

On Jul 11, 2022, at 08:55, adescoeudres @.***> wrote:

Hi, thanks for the quick reply. I am performing all the steps you mention, but maybe I did not make my issue clear enough:

Keeping k and beta exactly the same, and running the same script on the same data, I get different clustering results when I repeat the analysis. I am evaluating this by ARI in addition to visually, and the ARI variation is not negligible (up to 0.15).

Do you have an intuition about what might be the issue, or whether this kind of change is expected?

I hope my question is clearer now, and thank you again for your time.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

gcyuan commented 2 years ago

Hi, Thanks for the clarification. I understand the question now. As Andrew suggested (thanks for your input), the results are variable due to random initialization. Fixing the random seed is one way to fix the issue; however, if the results are highly dependent on initial seed, this is an indication that the spatial structures are not reliable -- perhaps due the parameter setting is not appropriate. Hope this helps. Best GC

On Mon, Jul 11, 2022 at 9:56 AM Andrew J. Rech @.***> wrote:

Hi there,

I use Giotto and other clustering methods in R and have experience with this issue. My two cents are that you might need to set.seed(1234). If you are running using a parallel backend - depending on which backend - you might need to address this in every R instantiation.

If you have already done this, I’ve found that some clustering methods do not respect this value, but I am not sure if Giotto using such a method.

Andrew

On Jul 11, 2022, at 08:55, adescoeudres @.***> wrote:

Hi, thanks for the quick reply. I am performing all the steps you mention, but maybe I did not make my issue clear enough:

Keeping k and beta exactly the same, and running the same script on the same data, I get different clustering results when I repeat the analysis. I am evaluating this by ARI in addition to visually, and the ARI variation is not negligible (up to 0.15).

Do you have an intuition about what might be the issue, or whether this kind of change is expected?

I hope my question is clearer now, and thank you again for your time.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

— Reply to this email directly, view it on GitHub https://github.com/RubD/Giotto/issues/278#issuecomment-1180441255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPVDOYDG2EEAZH2PCTGTDVTQRXVANCNFSM525ZB2FQ . You are receiving this because you commented.Message ID: @.***>

adescoeudres commented 2 years ago

Hi guys, thanks, both of you, for your replies. The parameter setting should not be the problem, as I am trying to reproduce some results with known parameter settings. However I will try fixing the random seed and seeing whether this works. @andrewrech, the comment about data.table potentially losing state is interesting, thank you! Best