drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
258 stars 98 forks source link

Criteria to select the best PC dimensions to run UMAP #537

Open krigia opened 1 year ago

krigia commented 1 year ago

@RubD @mattobny @josschavezf @gcyuan Hello Giotto team,

In Giotto Suite for 10X Visium analysis of mouse/brain, you use "dimensions_to_use = 1:10" to run the UMAP function.

We are currently running 10X Visium for FFPE tumor samples. Taking into consideration that the number of spots and features captured vary across the different tumor samples, we run multiple "dimensions_to_use" = 1:10, 1:12, 1:15, 1:17, 1:20, 1:25 and 1:30 to select the optimum one. Sometimes the clusters are distinct and tight, but in other cases they overlap and the PC selection is not always easy.

What are the criteria you apply to make the PC selection?

Thank you

RubD commented 1 year ago

Hi @krigia

There isn't really a fixed rule to decide on the number of PCs to use, although there are a few ways to inform you how informative each PC is. You can plot the % of variation that is explained per PC (see screePlot, also known as elbow plot) or assess how robust a PC is based on a gene permutation strategy (see jackStrawPlot - could take a long time to run).

A lot of people also use some external information or expected results to drive the choice of the total number of PCs to use.

krigia commented 1 year ago

@RubD Thank you. I run the "jackstrawPlot" function for different threshold cut off and for most cases I always get significant results for all PCs. _jackstrawPlot(F1_normalized_standard_6000, ncp = 30, threshold = 0.01, verbose = TRUE, show_plot = T, save_plot = T, save_param = list(save_folder = "JackStraw", save_name = "F1_normalized_standard_6000_screePC-30"))

Any further thoughts or input are highly appreciated.

Thank you

krigia commented 1 year ago

Hello Giotto team,

I follow up to my posted question https://github.com/drieslab/Giotto/issues/537

I cannot upload the plots through github, so I am sending you the jackstraw plots (attached) for two different thresholds (0.01 & 9.95e-4). I run the following function: jackstrawPlot(F1_normalized_standard_6000, ncp = 30, threshold = 0.01, verbose = TRUE, show_plot = T, save_plot = T, save_param = list(save_folder = "JackStraw", save_name = "F1_normalized_standard_6000_scree_PC-30"))

They show exactly the same output results with the same p-value for all PCs (1:30) for most cases. It does not make sense. I would greatly appreciate your input.

Thank you.


From: Ruben Dries @.> Sent: Thursday, February 2, 2023 1:05 PM To: drieslab/Giotto @.> Cc: Giannikou, Krinio @.>; Mention @.> Subject: Re: [drieslab/Giotto] Criteria to select the best PC dimensions to run UMAP (Issue #537)

    External Email - Use Caution

Hi @krigiahttps://secure-web.cisco.com/1MeXoWD8lnatG-stB9Jc_7AbRYoRzWNZioZO70lYYVHxo8i7IYhXWNYoihb1SUQ3P6FLfKPnxZzlc2j9a5jncfVtclCeMh9CC9OTLi1zhTzHp6ZXLfGpAFdr_sXGxG939IGEt8TKcm-o_a35AR8uAYZrQqNjiCeph0WpTpme3pQIiUmrpIWDzTGE0MOFJXY1ERW367ioTc5uAC4ztmACwanbFA1FXbaKu9Xjt6-lr01F0UXALC1f01dh1rqYAE3bVfUJ6fHF_kGD8TbSzGtD4cnnToJPgB9lwM2uULQDJfBmYCcP_YCCMhb_FgFwzNUbeNeAt4tunNUthg0wKFhfRCA/https%3A%2F%2Fgithub.com%2Fkrigia

There isn't really a fixed rule to decide on the number of PCs to use, although there are a few ways to inform you how informative each PC is. You can plot the % of variation that is explained per PC (see screePlot, also known as elbow plot) or assess how robust a PC is based on a gene permutation strategy (see jackStrawPlot - could take a long time to run).

A lot of people also use some external information or expected results to drive the choice of the total number of PCs to use.

— Reply to this email directly, view it on GitHubhttps://secure-web.cisco.com/1Cg6vd9rkPQ-XM_xqa_wiXKTx7s9o8xznXAYoz8jX5V7_Be_rPQw2_H-bFkCNoymuCEE2VxcVbCfOEVXE41iT4mUcdxAQ3HkAKFuTROeSWXnVB-rJ2B30SNxwAlB3unhNUjzGG7YHzmIu2n-A4mQtH7GsfqonB9-cG__qdNSL2Sq52hQ6Ih25EoDo8aQcxy7TyXbSaIxSgOgA25kq9FccxoUYMoNyGtvT1tCCoSk4pbRemopoa42ifW_6xTd1T8hVUASq83ankDKqewJYx_X6i4JSC7t5HebcHQG1aQJJgnif189DCGsjD2S7BJ09NGwAr8aC5ocpYqScadaDzGQ1_A/https%3A%2F%2Fgithub.com%2Fdrieslab%2FGiotto%2Fissues%2F537%23issuecomment-1414302616, or unsubscribehttps://secure-web.cisco.com/1srMrDm3vzOu6bNezt6ArdyF7dV4X4X7vmAmFXPtK0O9n_jjdN0swaaM_08_kk9mqrqizQbtqPGSqfZhacJ2FaDMVB7HmHDBYqHyVYBHrHmRFkttZNurapi3UsAziJ_8RnoPRgbWp-OcHRnsD3-z87cK102Armdr2zxDIYnwd0S3v-0mANMFMrzFBtdGggZyuCLET_K508aTiQwqFLUQ0aXylqgj3n79LI0muUwZOLqRS8niIYwX1-ToJGt7VH0FZs8d5B1PciilXxabnMBdmA_VViIfylW1pIERYUu5IVXHJ4ZCSm5fBtBKYfXsY_7Fttya-wsvKg9W__nc-dTUewQ/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAXYDRTKCWDZJCHEINFLQ7CLWVQHPBANCNFSM6AAAAAAUPNJ4IA. You are receiving this because you were mentioned.Message ID: @.***> The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline https://www.massgeneralbrigham.org/complianceline . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.