drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
258 stars 98 forks source link

HMRF on big dataset #133

Closed marcovarrone closed 2 years ago

marcovarrone commented 3 years ago

I am trying to run doHMRF on the CODEX spleen dataset available in Giotto. If I run it on a subset of the dataset (I take some adjacent tiles) it works fine, with acceptable runtimes (e.g. ~20 mins for 25500 cells).

However, if I increase further the number of cells (e.g. 36000), the run stops early, after 1.5 mins, and no output is returned.

The memory required to run the 25500 cells version is not high, so I don't think the execution is going out of memory.

To reproduce the problem, it is sufficient to follow the tutorial on the preprocessing of the codex sample and the tutorial on running doHMRF.

The parameters for doHMRF are the following: HMRF_spatial_genes = doHMRF(gobject=codex_test, expression_values='scaled', spatial_network_name='Delaunay_network', k=10, betas = c(0,0.5,10), output_folder = paste0(my_working_dir, '/', 'Spatial_genes/SG_k10_scaled'), spatial_genes=codex_test@gene_ID).

I am using R 3.6.0, and Giotto 1.0.4.

Thanks!

bernard2012 commented 3 years ago

Hi Marco,

CODEX is indeed a large dataset. The error is caused by low stack limit. To increase it:

in bash and before running R:

ulimit -s #check your limit, it may be 8MB.
ulimit -s 100000 #increase stack limit to 100MB

Then run R followed by loading Giotto and doHMRF again. If trying this did not help it or you run into other issues in the process, still increase the stack limit above, and here are a few more tips:

Cheers,