aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

numpy.core._exceptions.MemoryError #241

Closed YayanFeng55 closed 8 months ago

YayanFeng55 commented 8 months ago

Dear all,

I am working with a 508412*103279 matrix with pycistopic and got below error:

Describe the bug I am calculating DARs per cell type using below code:

imputed_acc_obj = impute_accessibility(cistopic_obj, selected_cells=None, selected_regions=None, scale_factor=10**6)

And it occurs below error:


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/diff_features.py", line 478, in impute_accessibility
    imputed_acc, region_names_to_keep = calculate_imputed_accessibility(
  File "/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/diff_features.py", line 417, in calculate_imputed_accessibility
    imputed_acc = np.empty(
numpy.core._exceptions.MemoryError: Unable to allocate 196. GiB for an array with shape (508412, 103279) and data type int32

Version (please complete the following information):

I think this bug is related with numpy memory

Do you have any suggestions to fix this issue?

Thanks Yayan

nikithkurella commented 8 months ago

I am running into this issue as well when trying to create the cistopic object. I am using code which I copied from the 10x multiome pbmc tutorial:

from pycisTopic.cistopic_class import *
key = 'yAL'
cistopic_obj = create_cistopic_object_from_fragments(
                            path_to_fragments=fragments_dict[key],
                            path_to_regions=path_to_regions[key],
                            path_to_blacklist=path_to_blacklist,
                            n_cpu=10,
                            project=key,
                            split_pattern='-')
cistopic_obj.add_cell_data(cell_data, split_pattern='-')

Here is the error, which also seems to be related to numpy:

2023-10-05 13:16:46,191 cisTopic     INFO     Reading data for yAL
2023-10-05 13:17:42,991 cisTopic     INFO     Counting number of unique fragments (Unique_nr_frag)
2023-10-05 13:18:07,220 cisTopic     INFO     Counting fragments in regions
2023-10-05 13:18:40,563 cisTopic     INFO     Creating fragment matrix
2023-10-05 13:25:13,500 cisTopic     INFO     Data is too big, making partitions. This is a reported error in Pandas versions > 0.21 (https://github.com/pandas-dev/pandas/issues/26314)
/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py:881: PerformanceWarning: The following operation may generate 96825190176 cells in the resulting pandas object.
  counts_df.groupby(["Name", "regionID"], sort=False, observed=True)
/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py:949: PerformanceWarning: The following operation may generate 19365073992 cells in the resulting pandas object.
  df.groupby(["Name", "regionID"], sort=False, observed=True)
Traceback (most recent call last):
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 881, in create_cistopic_object_from_fragments
    counts_df.groupby(["Name", "regionID"], sort=False, observed=True)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/generic.py", line 6245, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 446, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 348, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 527, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
    return arr.astype(dtype, copy=True)
numpy.core._exceptions.MemoryError: Unable to allocate 361. GiB for an array with shape (538564, 179784) and data type int32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Y_AL.py", line 208, in <module>
    cistopic_obj = create_cistopic_object_from_fragments(
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 908, in create_cistopic_object_from_fragments
    cistopic_obj_list = [
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 909, in <listcomp>
    create_cistopic_object_chunk(
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 949, in create_cistopic_object_chunk
    df.groupby(["Name", "regionID"], sort=False, observed=True)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/series.py", line 4458, in unstack
    return unstack(self, level, fill_value)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 493, in unstack
    return unstacker.get_result(
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 216, in get_result
    values, _ = self.get_new_values(values, fill_value)
  File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 272, in get_new_values
    new_mask = np.zeros(result_shape, dtype=bool)
numpy.core._exceptions.MemoryError: Unable to allocate 18.0 GiB for an array with shape (179784, 107713) and data type bool
YayanFeng55 commented 8 months ago

Hi there,

I have fixed this issue by releasing python environment. You can find detailed information about memory management in python from below link: https://docs.python.org/3/c-api/memory.html

It said: 'Memory management in Python involves a private heap containing all Python objects and data structures.' So I deleted several objects in python environment using 'del object'. Then I check python memory using 'import psutil'. The free memory was 800Gb after I deleted some objects. Then I rerun the script and it works well.

Hope this approach help you

Thanks Yayan

SeppeDeWinter commented 8 months ago

Hi @YayanFeng55 and @nikithkurella

This issue is indeed related to memory.

You don't have enough memory available on your system to store the matrix that is being generated. This can be indeed solved by clearing memory (as @YayanFeng55 suggested), or using a machine that has more memory.

This step is quite memory intensive.

All the best,

Seppe