aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

encountering issues with export_pseudobulk and so on. #275

Open Jinkyustar opened 9 months ago

Jinkyustar commented 9 months ago

I'm just following the tutorial for 10x multiome in the document. I have errors for ATAC-seq preprocessing. I would like to ask all the errors that I encountered when I ran following function.

from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
                 variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
                 sample_id_col = 'sample_id',
                 chromsizes = chromsizes,
                 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
                 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
                 path_to_fragments = fragments_dict,                                     # location of fragment fiels
                 n_cpu = 8,                                                    # specify the number of cores to use, we use ray for multiprocessing
                 normalize_bigwig = True,
                 remove_duplicates = True,
                 _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
                 split_pattern = '-')

the error message is

2023-12-18 16:36:32,464 cisTopic     INFO     Reading fragments from pbmc_tutorial/data/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[154], line 2
      1 from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
----> 2 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
      3                  variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
      4                  sample_id_col = 'sample_id',
      5                  chromsizes = chromsizes,
      6                  bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
      7                  bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
      8                  path_to_fragments = fragments_dict,                                                        # location of fragment fiels
      9                  n_cpu = 8,                                                                                 # specify the number of cores to use, we use ray for multi processing
     10                  normalize_bigwig = True,
     11                  remove_duplicates = True,
     12                  _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
     13                  split_pattern = '-')

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:165, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, remove_duplicates, split_pattern, use_polars, **kwargs)
    163 # Create pseudobulks
    164 if n_cpu > 1:
--> 165     ray.init(num_cpus=n_cpu, **kwargs)
    166     ray_handle = ray.wait(
    167         [
    168             export_pseudobulk_ray.remote(
   (...)
    182         num_returns=len(groups),
    183     )
    184     ray.shutdown()

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ray/_private/client_mode_hook.py:103, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
    101     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    102         return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ray/_private/worker.py:1567, in init(address, num_cpus, num_gpus, resources, labels, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, log_to_driver, namespace, runtime_env, storage, **kwargs)
   1534     ray_params = ray._private.parameter.RayParams(
   1535         node_ip_address=_node_ip_address,
   1536         object_ref_seed=None,
   (...)
   1561         node_name=_node_name,
   1562     )
   1563     # Start the Ray processes. We set shutdown_at_exit=False because we
   1564     # shutdown the node in the ray.shutdown call that happens in the atexit
   1565     # handler. We still spawn a reaper process in case the atexit handler
   1566     # isn't called.
-> 1567     _global_node = ray._private.node.Node(
   1568         head=True,
   1569         shutdown_at_exit=False,
   1570         spawn_reaper=True,
   1571         ray_params=ray_params,
   1572     )
   1573 else:
   1574     # In this case, we are connecting to an existing cluster.
   1575     if num_cpus is not None or num_gpus is not None:

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ray/_private/node.py:263, in Node.__init__(self, ray_params, head, shutdown_at_exit, spawn_reaper, connect_only, default_worker)
    260         self._ray_params.node_manager_port = node_info["node_manager_port"]
    261 else:
    262     # If the user specified a socket name, use it.
--> 263     self._plasma_store_socket_name = self._prepare_socket_file(
    264         self._ray_params.plasma_store_socket_name, default_prefix="plasma_store"
    265     )
    266     self._raylet_socket_name = self._prepare_socket_file(
    267         self._ray_params.raylet_socket_name, default_prefix="raylet"
    268     )
    270 self.metrics_agent_port = self._get_cached_port(
    271     "metrics_agent_port", default_port=ray_params.metrics_agent_port
    272 )

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ray/_private/node.py:922, in Node._prepare_socket_file(self, socket_path, default_prefix)
    920     maxlen = (104 if is_mac else 108) - 1  # sockaddr_un->sun_path
    921     if len(result.split("://", 1)[-1].encode("utf-8")) > maxlen:
--> 922         raise OSError(
    923             f"AF_UNIX path length cannot exceed {maxlen} bytes: {result!r}"
    924         )
    925 return result

OSError: AF_UNIX path length cannot exceed 107 bytes: '/athena/josefowiczlab/scratch/jic2016/ray_spill/session_2023-12-18_16-37-06_788375_890985/sockets/plasma_store'

If I change the '_temp_dir' to 'None', then it doesn't run for all the celltype. there is no output files saved in the pathway.

2023-12-18 16:48:01,594 cisTopic     INFO     Reading fragments from pbmc_tutorial/data/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz
2023-12-18 16:48:39,004 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265/ 
(export_pseudobulk_ray pid=903303) 2023-12-18 16:48:52,048 cisTopic     INFO     Creating pseudobulk for B_cells_1
(export_pseudobulk_ray pid=903311) 2023-12-18 16:48:53,538 cisTopic     INFO     Creating pseudobulk for Dendritic_cells
(export_pseudobulk_ray pid=903308) 2023-12-18 16:48:55,684 cisTopic     INFO     Creating pseudobulk for NK_cells
(export_pseudobulk_ray pid=903304) 2023-12-18 16:48:55,697 cisTopic     INFO     Creating pseudobulk for FCGR3A_Monocytes [repeated 5x across cluster]

I tried changing n_cpu to 1 from 8. Then I get this error, which seems to be related to pyrange.

2023-12-18 16:52:24,308 cisTopic     INFO     Reading fragments from pbmc_tutorial/data/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz
2023-12-18 16:52:58,427 cisTopic     INFO     Creating pseudobulk for B_cells_1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_890985/3178510545.py in ?()
----> 2 from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
      3 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
      4                  variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
      5                  sample_id_col = 'sample_id',

~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py in ?(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, remove_duplicates, split_pattern, use_polars, **kwargs)
    182             num_returns=len(groups),
    183         )
    184         ray.shutdown()
    185     else:
--> 186         [
    187             export_pseudobulk_one_sample(
    188                 cell_data,
    189                 group,

~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py in ?(.0)
    186 def export_pseudobulk(
--> 187     input_data: Union["CistopicObject", pd.DataFrame, Dict[str, pd.DataFrame]],
    188     variable: str,
    189     chromsizes: Union[pd.DataFrame, pr.PyRanges],
    190     bed_path: str,

~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py in ?(cell_data, group, fragments_df_dict, chromsizes, bigwig_path, bed_path, sample_id_col, normalize_bigwig, remove_duplicates, split_pattern)
    281     group_pr = pr.PyRanges(group_fragments)
    282     if isinstance(bigwig_path, str):
    283         bigwig_path_group = os.path.join(bigwig_path, str(group) + ".bw")
    284         if remove_duplicates:
--> 285             group_pr.to_bigwig(
    286                 path=bigwig_path_group,
    287                 chromosome_sizes=chromsizes,
    288                 rpm=normalize_bigwig,

~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/pyranges_main.py in ?(self, path, chromosome_sizes, rpm, divide, value_col, dryrun, chain)
   5502 
   5503         if chromosome_sizes is None:
   5504             chromosome_sizes = pr.data.chromsizes()
   5505 
-> 5506         result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
   5507 
   5508         if dryrun:
   5509             return result

~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/out.py in ?(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
    208             new_pyrles[k] = v
    209 
    210         gr = c.defragment().to_ranges()
    211 
--> 212     unique_chromosomes = gr.chromosomes
    213 
    214     subset = ["Chromosome", "Start", "End", "Score"]
    215 

~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/generic.py in ?(self, name)
   5903             and name not in self._accessors
   5904             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5905         ):
   5906             return self[name]
-> 5907         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'chromosomes'

I previously managed to get around with this issue somhow in different server although I encountered issues when running GRN. Then I'm having issues with these steps again after moving to a new server..

could you help me solve the issues? I remember I had similar issues when running peak calling.

SeppeDeWinter commented 9 months ago

Hi @Jinkyustar

See: https://github.com/aertslab/scenicplus/issues/24#issuecomment-1228450341 for the issue related to OSError: AF_UNIX path length cannot exceed 107 bytes: '/athena/josefowiczlab/scratch/jic2016/ray_spill/session_2023-12-18_16-37-06_788375_890985/sockets/plasma_store'

For the other error. Can you make sure the chromosome names in your fragment files match to those in the chromsizes?

All the best,

Seppe