aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

Ray spill out of disk error when using run_pycistarget #78

Closed DavidvanBruggen closed 1 year ago

DavidvanBruggen commented 1 year ago

While running run_pycistarget wrapper on a 160 topic model containing 20k cells I get a out of disk error, while having 600GB of memory, I still need more. Is there a way to limit the disk usage and memory usage?

At the moment I have tried to run with 8 cores on 600GB of memory and 200-300GB of tmp scratch space available.

I this expected for the run or can the memory load be minimized?

Thanks!

For mm10 these are the dbs I'm using https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen/mc_v10_clust/region_based/mm10_screen_v10_clust.regions_vs_motifs.rankings.feather https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen/mc_v10_clust/region_based/mm10_screen_v10_clust.regions_vs_motifs.scores.feather https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl

And this is the run code:

from scenicplus.wrappers.run_pycistarget import run_pycistarget run_pycistarget( region_sets = region_sets, species = 'mus_musculus', save_path = os.path.join(work_dir, 'motifs'), ctx_db_path = rankings_db, dem_db_path = scores_db, path_to_motif_annotations = motif_annotation, run_without_promoters = True, n_cpu = 8, _temp_dir = os.path.join(tmp_dir, 'ray_spill'), annotation_version = 'v10nr_clust', )

DavidvanBruggen commented 1 year ago

It seems running with 1 core reduces the overhead considerably reducing the memory from 600GB+ to ~20GB.