aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
56 stars 11 forks source link

Bug report [BUG] Trying to merge (cis_object_list) but gives an error that it is not defined #167

Open yojetsharma opened 1 week ago

yojetsharma commented 1 week ago

Describe the bug I am running pycisTopic and everything ran smoothly until i reached the step of creating a merged cis_obj_list:

import warnings
warnings.simplefilter(action='ignore')
import pandas
import pycisTopic
pycisTopic.__version__
'2.0a0'
path_to_regions = os.path.join(out_dir, "consensus_peak_calling/consensus_regions.bed")
path_to_blacklist = "/home/praghu/yojetsharma/softwares/pycisTopic/blacklist/hg38-blacklist.v2.bed"
pycistopic_qc_output_dir = "qc"

from pycisTopic.cistopic_class import create_cistopic_object_from_fragments
import polars as pl

cistopic_obj_list = []
for sample_id in fragments_dict:
    sample_metrics = pl.read_parquet(
        os.path.join(pycistopic_qc_output_dir, f'{sample_id}.fragments_stats_per_cb.parquet')
    ).to_pandas().set_index("CB").loc[ sample_id_to_barcodes_passing_filters[sample_id] ]
    cistopic_obj = create_cistopic_object_from_fragments(
        path_to_fragments = fragments_dict[sample_id],
        path_to_regions = path_to_regions,
        path_to_blacklist = path_to_blacklist,
        metrics = sample_metrics,
        valid_bc = sample_id_to_barcodes_passing_filters[sample_id],
        n_cpu = 1,
        project = sample_id,
        split_pattern = '-'
    )
    cistopic_obj_list.append(cistopic_obj)
2024-09-15 12:15:30,495 cisTopic     INFO     Reading data for d149
2024-09-15 12:18:25,421 cisTopic     INFO     metrics provided!
2024-09-15 12:18:39,466 cisTopic     INFO     Counting fragments in regions
2024-09-15 12:20:54,984 cisTopic     INFO     Creating fragment matrix
/ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/softwares/pycisTopic/src/pycisTopic/cistopic_class.py:886: PerformanceWarning: The following operation may generate 6280216839 cells in the resulting pandas object.
  .unstack(level="Name", fill_value=0)
2024-09-15 12:23:19,172 cisTopic     INFO     Converting fragment matrix to sparse matrix
2024-09-15 12:24:09,916 cisTopic     INFO     Removing blacklisted regions
2024-09-15 12:24:11,687 cisTopic     INFO     Creating CistopicObject
2024-09-15 12:24:15,938 cisTopic     INFO     Done!
2024-09-15 12:24:17,884 cisTopic     INFO     Reading data for ls002
2024-09-15 12:27:16,855 cisTopic     INFO     metrics provided!
2024-09-15 12:27:30,732 cisTopic     INFO     Counting fragments in regions
2024-09-15 12:29:53,260 cisTopic     INFO     Creating fragment matrix
/ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/softwares/pycisTopic/src/pycisTopic/cistopic_class.py:886: PerformanceWarning: The following operation may generate 11394507046 cells in the resulting pandas object.
  .unstack(level="Name", fill_value=0)
2024-09-15 12:33:34,320 cisTopic     INFO     Converting fragment matrix to sparse matrix
2024-09-15 12:35:22,478 cisTopic     INFO     Removing blacklisted regions
2024-09-15 12:35:24,316 cisTopic     INFO     Creating CistopicObject
2024-09-15 12:35:29,299 cisTopic     INFO     Done!
2024-09-15 12:35:31,288 cisTopic     INFO     Reading data for ls003
2024-09-15 12:39:03,988 cisTopic     INFO     metrics provided!
2024-09-15 12:39:20,066 cisTopic     INFO     Counting fragments in regions
2024-09-15 12:42:09,488 cisTopic     INFO     Creating fragment matrix
/ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/softwares/pycisTopic/src/pycisTopic/cistopic_class.py:886: PerformanceWarning: The following operation may generate 7762093599 cells in the resulting pandas object.
  .unstack(level="Name", fill_value=0)
2024-09-15 12:49:48,158 cisTopic     INFO     Converting fragment matrix to sparse matrix
2024-09-15 12:50:58,834 cisTopic     INFO     Removing blacklisted regions
2024-09-15 12:51:02,252 cisTopic     INFO     Creating CistopicObject
2024-09-15 12:51:11,716 cisTopic     INFO     Done!
cistopic_obj = merge(cistopic_obj_list)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[27], line 1
----> 1 cistopic_obj = merge(cistopic_obj_list)

NameError: name 'merge' is not defined

To Reproduce Commands relevant to reproduce the error.

Error output Paste the entire output of the command, including log information prior to the error.

Expected behavior I would expect it the merge() function to run as is described in the notebook:

cistopic_obj = merge(cistopic_obj_list)
2022-08-09 09:58:30,928 cisTopic     INFO     cisTopic object 1 merged
2022-08-09 09:58:41,004 cisTopic     INFO     cisTopic object 2 merged
2022-08-09 09:58:53,013 cisTopic     INFO     cisTopic object 3 merged
2022-08-09 09:59:09,175 cisTopic     INFO     cisTopic object 4 merged
In [8]: 

Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

Additional context Add any other context about the problem here.

yojetsharma commented 1 week ago

Describe the bug I am running pycisTopic and everything ran smoothly until i reached the step of creating a merged cis_obj_list:

import warnings
warnings.simplefilter(action='ignore')
import pandas
import pycisTopic
pycisTopic.__version__
'2.0a0'
path_to_regions = os.path.join(out_dir, "consensus_peak_calling/consensus_regions.bed")
path_to_blacklist = "/home/praghu/yojetsharma/softwares/pycisTopic/blacklist/hg38-blacklist.v2.bed"
pycistopic_qc_output_dir = "qc"

from pycisTopic.cistopic_class import create_cistopic_object_from_fragments
import polars as pl

cistopic_obj_list = []
for sample_id in fragments_dict:
    sample_metrics = pl.read_parquet(
        os.path.join(pycistopic_qc_output_dir, f'{sample_id}.fragments_stats_per_cb.parquet')
    ).to_pandas().set_index("CB").loc[ sample_id_to_barcodes_passing_filters[sample_id] ]
    cistopic_obj = create_cistopic_object_from_fragments(
        path_to_fragments = fragments_dict[sample_id],
        path_to_regions = path_to_regions,
        path_to_blacklist = path_to_blacklist,
        metrics = sample_metrics,
        valid_bc = sample_id_to_barcodes_passing_filters[sample_id],
        n_cpu = 1,
        project = sample_id,
        split_pattern = '-'
    )
    cistopic_obj_list.append(cistopic_obj)
2024-09-15 12:15:30,495 cisTopic     INFO     Reading data for d149
2024-09-15 12:18:25,421 cisTopic     INFO     metrics provided!
2024-09-15 12:18:39,466 cisTopic     INFO     Counting fragments in regions
2024-09-15 12:20:54,984 cisTopic     INFO     Creating fragment matrix
/ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/softwares/pycisTopic/src/pycisTopic/cistopic_class.py:886: PerformanceWarning: The following operation may generate 6280216839 cells in the resulting pandas object.
  .unstack(level="Name", fill_value=0)
2024-09-15 12:23:19,172 cisTopic     INFO     Converting fragment matrix to sparse matrix
2024-09-15 12:24:09,916 cisTopic     INFO     Removing blacklisted regions
2024-09-15 12:24:11,687 cisTopic     INFO     Creating CistopicObject
2024-09-15 12:24:15,938 cisTopic     INFO     Done!
2024-09-15 12:24:17,884 cisTopic     INFO     Reading data for ls002
2024-09-15 12:27:16,855 cisTopic     INFO     metrics provided!
2024-09-15 12:27:30,732 cisTopic     INFO     Counting fragments in regions
2024-09-15 12:29:53,260 cisTopic     INFO     Creating fragment matrix
/ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/softwares/pycisTopic/src/pycisTopic/cistopic_class.py:886: PerformanceWarning: The following operation may generate 11394507046 cells in the resulting pandas object.
  .unstack(level="Name", fill_value=0)
2024-09-15 12:33:34,320 cisTopic     INFO     Converting fragment matrix to sparse matrix
2024-09-15 12:35:22,478 cisTopic     INFO     Removing blacklisted regions
2024-09-15 12:35:24,316 cisTopic     INFO     Creating CistopicObject
2024-09-15 12:35:29,299 cisTopic     INFO     Done!
2024-09-15 12:35:31,288 cisTopic     INFO     Reading data for ls003
2024-09-15 12:39:03,988 cisTopic     INFO     metrics provided!
2024-09-15 12:39:20,066 cisTopic     INFO     Counting fragments in regions
2024-09-15 12:42:09,488 cisTopic     INFO     Creating fragment matrix
/ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/softwares/pycisTopic/src/pycisTopic/cistopic_class.py:886: PerformanceWarning: The following operation may generate 7762093599 cells in the resulting pandas object.
  .unstack(level="Name", fill_value=0)
2024-09-15 12:49:48,158 cisTopic     INFO     Converting fragment matrix to sparse matrix
2024-09-15 12:50:58,834 cisTopic     INFO     Removing blacklisted regions
2024-09-15 12:51:02,252 cisTopic     INFO     Creating CistopicObject
2024-09-15 12:51:11,716 cisTopic     INFO     Done!
cistopic_obj = merge(cistopic_obj_list)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[27], line 1
----> 1 cistopic_obj = merge(cistopic_obj_list)

NameError: name 'merge' is not defined

To Reproduce Commands relevant to reproduce the error.

Error output Paste the entire output of the command, including log information prior to the error.

Expected behavior I would expect it the merge() function to run as is described in the notebook:

cistopic_obj = merge(cistopic_obj_list)
2022-08-09 09:58:30,928 cisTopic     INFO     cisTopic object 1 merged
2022-08-09 09:58:41,004 cisTopic     INFO     cisTopic object 2 merged
2022-08-09 09:58:53,013 cisTopic     INFO     cisTopic object 3 merged
2022-08-09 09:59:09,175 cisTopic     INFO     cisTopic object 4 merged
In [8]:   

Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

  • Python.3.11

Additional context Add any other context about the problem here.

Okay, I was able to solve this by first calling the CistopicObject and then merge. dir(pycisTopic) gave me:

['DistributionNotFound',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '__warningregistry__',
 'cistopic_class',
 'contextlib',
 'fragments',
 'genomic_ranges',
 'get_distribution',
 'plotting',
 'qc',
 'topic_binarization',
 'tss_profile',
 'utils']
 import pycisTopic.cistopic_class as cistopic_class

Figured merge would be in cistopic_class so listed that:

# List all attributes and methods in the module
print(dir(cistopic_class))

# Check for the class directly
print(hasattr(cistopic_class, 'CistopicObject'))
['CistopicObject', 'Self', 'TYPE_CHECKING', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__warningregistry__', 'annotations', 'cl', 'collapse_duplicates', 'create_cistopic_object', 'create_cistopic_object_chunk', 'create_cistopic_object_from_fragments', 'create_cistopic_object_from_matrix_file', 'dtype', 'get_position_index', 'logging', 'merge', 'non_zero_rows', 'np', 'pd', 'pr', 'prepare_tag_cells', 'read_fragments_to_pyranges', 'region_names_to_coordinates', 'sp', 'sparse', 'subset_list', 'sys']
True

Found merge here and so imported that:

**from pycisTopic.cistopic_class import CistopicObject, merge**

cistopic_obj = merge(cistopic_obj_list)
2024-09-15 13:31:30,536 cisTopic     INFO     cisTopic object 1 merged
2024-09-15 13:31:49,202 cisTopic     INFO     cisTopic object 2 merged
print(cistopic_obj)
CistopicObject from project cisTopic_merge with n_cells × n_regions = 60499 × 420461

But still don't understand why it wouldn't import the function?