Closed YayanFeng55 closed 8 months ago
I am running into this issue as well when trying to create the cistopic object. I am using code which I copied from the 10x multiome pbmc tutorial:
from pycisTopic.cistopic_class import *
key = 'yAL'
cistopic_obj = create_cistopic_object_from_fragments(
path_to_fragments=fragments_dict[key],
path_to_regions=path_to_regions[key],
path_to_blacklist=path_to_blacklist,
n_cpu=10,
project=key,
split_pattern='-')
cistopic_obj.add_cell_data(cell_data, split_pattern='-')
Here is the error, which also seems to be related to numpy:
2023-10-05 13:16:46,191 cisTopic INFO Reading data for yAL
2023-10-05 13:17:42,991 cisTopic INFO Counting number of unique fragments (Unique_nr_frag)
2023-10-05 13:18:07,220 cisTopic INFO Counting fragments in regions
2023-10-05 13:18:40,563 cisTopic INFO Creating fragment matrix
2023-10-05 13:25:13,500 cisTopic INFO Data is too big, making partitions. This is a reported error in Pandas versions > 0.21 (https://github.com/pandas-dev/pandas/issues/26314)
/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py:881: PerformanceWarning: The following operation may generate 96825190176 cells in the resulting pandas object.
counts_df.groupby(["Name", "regionID"], sort=False, observed=True)
/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py:949: PerformanceWarning: The following operation may generate 19365073992 cells in the resulting pandas object.
df.groupby(["Name", "regionID"], sort=False, observed=True)
Traceback (most recent call last):
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 881, in create_cistopic_object_from_fragments
counts_df.groupby(["Name", "regionID"], sort=False, observed=True)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/generic.py", line 6245, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 446, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 348, in apply
applied = getattr(b, f)(**kwargs)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 527, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
return arr.astype(dtype, copy=True)
numpy.core._exceptions.MemoryError: Unable to allocate 361. GiB for an array with shape (538564, 179784) and data type int32
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Y_AL.py", line 208, in <module>
cistopic_obj = create_cistopic_object_from_fragments(
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 908, in create_cistopic_object_from_fragments
cistopic_obj_list = [
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 909, in <listcomp>
create_cistopic_object_chunk(
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/cistopic_class.py", line 949, in create_cistopic_object_chunk
df.groupby(["Name", "regionID"], sort=False, observed=True)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/series.py", line 4458, in unstack
return unstack(self, level, fill_value)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 493, in unstack
return unstacker.get_result(
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 216, in get_result
values, _ = self.get_new_values(values, fill_value)
File "/home1/kurella/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 272, in get_new_values
new_mask = np.zeros(result_shape, dtype=bool)
numpy.core._exceptions.MemoryError: Unable to allocate 18.0 GiB for an array with shape (179784, 107713) and data type bool
Hi there,
I have fixed this issue by releasing python environment. You can find detailed information about memory management in python from below link: https://docs.python.org/3/c-api/memory.html
It said: 'Memory management in Python involves a private heap containing all Python objects and data structures.' So I deleted several objects in python environment using 'del object'. Then I check python memory using 'import psutil'. The free memory was 800Gb after I deleted some objects. Then I rerun the script and it works well.
Hope this approach help you
Thanks Yayan
Hi @YayanFeng55 and @nikithkurella
This issue is indeed related to memory.
You don't have enough memory available on your system to store the matrix that is being generated. This can be indeed solved by clearing memory (as @YayanFeng55 suggested), or using a machine that has more memory.
This step is quite memory intensive.
All the best,
Seppe
Dear all,
I am working with a 508412*103279 matrix with pycistopic and got below error:
Describe the bug I am calculating DARs per cell type using below code:
And it occurs below error:
Version (please complete the following information):
I think this bug is related with numpy memory
Do you have any suggestions to fix this issue?
Thanks Yayan