BiomedicalMachineLearning / stLearn

A novel machine learning pipeline to analyse spatial transcriptomics data
Other
201 stars 26 forks source link

Handling Large Datasets in stLearn and the Impact on Computational Performance #298

Open gity123987 opened 5 months ago

gity123987 commented 5 months ago

Dear stLearn Team,

I appreciate your efforts in developing such an effective tool for spatial transcriptomics analysis! I have been exploring the capabilities of stLearn with Xenium data and have encountered a specific issue that I hope you can help me address.

Regarding the st.tl.cci.run_cci function in stLearn, I was able to successfully process this function for two out of four datasets. However, despite using a high-specification computer, the computation did not progress beyond 0% even after 24 hours for the remaining two datasets, which are larger in size. I am considering the introduction of a supercomputer, although the application process might be time-consuming. As an alternative, I am contemplating using Seurat to divide the data, such as with BuildNicheAssay(object = xenium.obj, fov = "crop", group.by = "predicted.celltype", niches.k = 5, neighbors.k = 30), and then running stLearn on the segmented data. Would this approach compromise the accuracy of the stLearn analysis?