Tang-Lab-super / PROST

PROST: A quantitative pattern recognition framework for spatial transcriptomics.
MIT License
5 stars 1 forks source link

Scalability on large datasets #4

Closed rocketeer1998 closed 1 week ago

rocketeer1998 commented 1 month ago

Hi @Sicrve11 , thanks for your contribution to PROST! Now I've successfully tested PROST on my data. But I'm stuck on analyzing large dataset that contains 50000 cells and 36 genes. Everything went smoothly until PROST.spatial_autocorrelation(adata, k = 10, permutations = None). It threw me an error which said,

Error: MemoryError: Unable to allocate 14.9 GiB for an array with shape (50000 , 50000) and data type float64

I'm working on a Windows10 machine with 64GB RAM. Below is my session info. Do you have any ideas to get the hypothesis statistics for large dataset? image

Sicrve11 commented 1 month ago

Thanks for your interest in PROST! For the problems encountered with the function spatial_autocorrelation, you could try saving the PI results and running the function separately in a new python, and also set a relatively small number for permutation test, e.g. permutations=10, and set multiprocess=False. The usability of the model on large data has to be improved, we are trying to solve this problem. Refer to #issue2 for the cause of this problem. Working in progress...