Regarding the questions about normalization and integration of gene expression matrices

feiyoung / PRECAST

an efficient data integration method for multiple spatial transcriptomics data with non- cluster-relevant eﬀects such as the complex batch eﬀects.

GNU General Public License v3.0

9 stars 3 forks source link

Regarding the questions about normalization and integration of gene expression matrices #14

Open liuxq000 opened 1 year ago

liuxq000 commented 1 year ago

Hello! I read your article and think you did a great job. I have two questions here. The first question is about the tutorial at https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html. Do I need to normalize the input gene expression matrix when I perform the analysis, or does your model already include the normalization step? The second question is regarding the step of integrating multiple sample expression matrices in IntegrateSpaData. Can it only integrate the top n highly variable genes that are identifiable, or can it integrate all genes from all samples?

feiyoung commented 1 year ago

Thank you for your attention to our work. The normalization step will be done when one creates the PRECASTObject using the function CreatePRECASTObject(). Additionally, it's important to note that the current version of PRECAST only supports the integrating the top n highly variable genes. One can use the function 'IntegrateSRTData()' in ProFAST package to integrating all genes; see https://feiyoung.github.io/ProFAST/reference/IntegrateSRTData.html for more details.

liuxq000 commented 1 year ago

Thank you for your response! After reviewing the ProFAST package, I found that it has some similarities with the PRECAST package, except for the different methods used for dimensionality reduction and clustering. Which package performs better in terms of effectiveness? Do you have any suggestions?

Thank you for your attention to our work. The normalization step will be done when one creates the PRECASTObject using the function CreatePRECASTObject(). Additionally, it's important to note that the current version of PRECAST only supports the integrating the top n highly variable genes. One can use the function 'IntegrateSRTData()' in ProFAST package to integrating all genes; see https://feiyoung.github.io/ProFAST/reference/IntegrateSRTData.html for more details.

feiyoung commented 1 year ago

ProFAST operates at a superior pace compared to PRECAST, primarily due to its exclusive emphasis on dimension reduction. In contrast, PRECAST undertakes a triad of tasks encompassing dimension reduction, clustering, and embedding alignment all at once. For substantial datasets, particularly those with spot quantities surpassing 500,000, I advocate for the utilization of ProFAST. Conversely, for smaller datasets, the choice aligns favorably with PRECAST.

liuxq000 commented 1 year ago

Thank you for your response!

ProFAST operates at a superior pace compared to PRECAST, primarily due to its exclusive emphasis on dimension reduction. In contrast, PRECAST undertakes a triad of tasks encompassing dimension reduction, clustering, and embedding alignment all at once. For substantial datasets, particularly those with spot quantities surpassing 500,000, I advocate for the utilization of ProFAST. Conversely, for smaller datasets, the choice aligns favorably with PRECAST.