aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
58 stars 12 forks source link

Memory issue even after bigmem #142

Open TingTingShao opened 6 months ago

TingTingShao commented 6 months ago

Hi,

I am running the step

rule imputed_acc:
    input:
        obj_pkl=os.path.join(config["work_dir"], "030results_cistopic/cistopic_obj.pkl"),
        flag=os.path.join(config['tmp_dir'], 'flags/topic_annot.done'),
    output:
        imputed_acc_obj=os.path.join(config["work_dir"], '030results_cistopic/040dars/imputed_acc_obj.pkl'),

    run:
        from pycisTopic.diff_features import impute_accessibility
        import numpy as np
        cistopic_obj = pickle.load(open(input.obj_pkl, 'rb'))
        imputed_acc_obj = impute_accessibility(
            cistopic_obj,
            selected_cells=None, # A list with selected cells to impute accessibility for
            selected_regions=None, # A list with selected regions to impute accessibility for
            scale_factor=10**6 # A number to multiply the imputed values for. This is useful to convert low probabilities to 0, making the matrix more sparse. 
        )
        print(type(imputed_acc_obj))

        pickle.dump(cistopic_obj,
            open(os.path.join(config["work_dir"], '030results_cistopic/cistopic_obj.pkl'), 'wb')) 
        pickle.dump(imputed_acc_obj,
            open(output.imputed_acc_obj, 'wb'))  # line 329 

Error

RuleException:
MemoryError in file /lustre1/project/stg_00079/students/tingting/data/sun/snap2_PFC_synapse/Snakefile03, line 329.
  File "/lustre1/project/stg_00079/students/tingting/data/sun/snap2_PFC_synapse/Snakefile03", line 329, in __rule_imputed_acc
<class 'pycisTopic.diff_features.CistopicImputedFeatures'>

The cistopic_obj.pkl is 26G, but I already requested bigmem partition, how come it still runs into memory issue?

Any suggestions?

Thanks, tingting

SeppeDeWinter commented 6 months ago

Hi @TingTingShao

How much memory did you request?

Best,

S

TingTingShao commented 6 months ago

Hi,

I think if I choose bigmem partition, it will be 2TiB (https://docs.vscentrum.be/leuven/wice_quick_start.html).

The setting I have is:

#!/bin/bash
#SBATCH --account="xx"
#SBATCH --job-name="r03"
#SBATCH --cluster="wice"
#SBATCH --partition="bigmem"
#SBATCH -N 1
#SBATCH -n 72
#SBATCH -t 72:00:00
#SBATCH -o out/aftmodel.out

snakemake -s Snakefile03 -c 20 --rerun-incomplete

Best, tingting

SeppeDeWinter commented 6 months ago

Hi TingTing

You should also request a specific amount of memory

using either --mem-per-cpu or --mem

Best,

Seppe

TingTingShao commented 6 months ago

Sorry,

But I checked the history again

(scenic2) bash-4.4$ seff 61416509
Job ID: 61416509
Cluster: wice
User/Group: vsc35107/vsc35107
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 72
CPU Utilized: 05:47:56
CPU Efficiency: 1.87% of 12-22:19:12 core-walltime
Job Wall-clock time: 04:18:36
Memory Utilized: 1.27 TB
Memory Efficiency: 66.07% of 1.92 TB

It's already 1.92 TB, feel it is not normal for consuming such large memory.

Thanks, tingting

tingting

TingTingShao commented 6 months ago

I also checked other files generated before: 2.9G cistopic.pkl corresponds to 59G imputed_acc_obj.pkl, so now I have ~270,000 nulcei, resulting in 26G cistopic object, so in this case, I would expect one nearly 60G imputed_acc_obj.pkl?

Is there any way to downsize the memory consumtion? maybe by reducing the number of cores?

Thanks, tingting

SeppeDeWinter commented 6 months ago

Hi Tingting

No it's not normal that it's consuming so much memory. How many regions are you using?

Best,

Seppe

TingTingShao commented 6 months ago

Hi,

I have 270072 nulcei with about 400,000 regions/bins.

Thanks, tingting