aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
182 stars 29 forks source link

Bug when running run_cgs_models: #181

Open YangLi-Bio opened 1 year ago

YangLi-Bio commented 1 year ago

Describe the bug I succeeded to run create_cistopic_object_from_fragments and save the resulting cistopic_obj to dist. However, when I tried to run run_cgs_models, I got an error message as well as a lot of accessory bugs. The error is AssertionError: pydantic.dataclasses.dataclass only supports init=False.

To Reproduce `# supress warnings import warnings warnings.simplefilter(action = 'ignore', category = FutureWarning) import sys import os _stderr = sys.stderr null = open(os.devnull,'wb') work_dir = '/fs/ess/PCON0022/liyang/STREAM-revision/Feasibility/scenic-plus-data/' tmp_dir = 'tmp_dir/'

scRNA-seq preprocessing using Scanpy

import scanpy as sc adata = sc.read_h5ad(work_dir + '10X_hg38_PBMC_3k_3k_3k.h5ad')

Data normalization

adata.raw = adata sc.pp.normalize_total(adata, target_sum = 1e4) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, min_mean = 0.0125, max_mean = 3, min_disp = 0.5) adata = adata[:, adata.var.highly_variable] sc.pp.scale(adata, max_value = 10)

Cell type annotation

sc.pp.neighbors(adata, n_neighbors = 10, n_pcs = 10) sc.tl.umap(adata) sc.tl.leiden(adata, resolution = 0.8, key_added = 'leiden_res_0.8') sc.pl.umap(adata, color = 'leiden_res_0.8', save = "_cell_clusters.png") adata.obs['celltype'] = adata.obs['leiden_res_0.8'] adata.write(os.path.join(work_dir, 'adata_GEX.h5ad'), compression = 'gzip')

scATAC-seq preprocessing using pycisTopic

scRNA_bc = adata.obs_names cell_data = adata.obs cell_data['sample_id'] = '10x_pbmc' cell_data['celltype'] = cell_data['celltype'].astype(str)

Load scATAC-seq data

import pickle import os import pycisTopic fragments_dict = {'10x_pbmc': os.path.join(work_dir, '../10X_hg38_PBMC_3k_3k_3k_fragments.tsv.gz')} path_to_regions = {'10x_pbmc': os.path.join(work_dir, '10X_hg38_PBMC_3k_3k_3k.bed')} path_to_blacklist = '/fs/ess/PCON0022/liyang/STREAM-revision/Feasibility/scenic-plus/blacklist-regions/hg38_ENCFF356LFX.bed'

Create pycisTopic object

from pycisTopic.cistopic_class import * key = '10x_pbmc' cistopic_obj = create_cistopic_object_from_fragments( path_to_fragments = fragments_dict[key], path_to_regions = path_to_regions[key], path_to_blacklist=path_to_blacklist, valid_bc = list(set(scRNA_bc)), n_cpu = 1, project = key, split_pattern = '-') cistopic_obj.add_cell_data(cell_data, split_pattern = '-') print(cistopic_obj) pickle.dump(cistopic_obj, open(os.path.join(work_dir, 'scATAC_cistopic_obj.pkl'), 'wb'))

Topic modeling

models = run_cgs_models(cistopic_obj, n_topics = [2, 4, 10, 16, 32, 48], n_cpu = 5, n_iter = 500, random_state = 555, alpha = 50, alpha_by_topic = True, eta = 0.1, eta_by_topic = False, save_path = None, _temp_dir = os.path.join(tmp_dir + 'ray_spill')) pickle.dump(models, open(os.path.join(work_dir, 'scATAC_models.pkl'), 'wb'))`

Error output `AssertionError: pydantic.dataclasses.dataclass only supports init=False Columns ['sample_id'] will be overwritten CistopicObject from project 10x_pbmc with n_cells × n_regions = 320 × 1678 Traceback (most recent call last): File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/node.py", line 293, in init ray._private.services.wait_for_node( File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/services.py", line 459, in wait_for_node raise TimeoutError( TimeoutError: Timed out after 30 seconds while waiting for node to startup. Did not find socket name tmp_dir/ray_spill/session_2023-07-18_06-33-23_115215_200125/sockets/plasma_store in the list of object store socket names.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "2-Run-scenicplus.py", line 84, in models = run_cgs_models(cistopic_obj, File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/lda_models.py", line 154, in run_cgs_models ray.init(num_cpus=n_cpu, *kwargs) File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/worker.py", line 1534, in init _global_node = ray._private.node.Node( File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/node.py", line 298, in init raise Exception( Exception: The current node timed out during startup. This could happen because some of the Ray processes failed to startup.`

Expected behavior I expected to get the results of Topic modeling.

Screenshots

image image

Version (please complete the following information):

SeppeDeWinter commented 1 year ago

Hi @YangLi-Bio

Downgrading your pydantic version to 1.x might solve your issue, see: https://discuss.ray.io/t/pydantic-dataclasses-dataclass-only-supports-init-false/11278.

This piece of code can help with debugging:


import ray
ray.init(
  num_cpus = 5,
  _temp_dir = os.path.join(tmp_dir + 'ray_spill'))

Best,

Seppe

YangLi-Bio commented 1 year ago

Dear Seppe,

Thanks for your patience and time.

This error still occurs even though I downgraded pydantic to 1.10.11. Besides, I have succeeded to run the piece of code that you provided. However, the error has not changed.

Could you please help figure out a feasible solution?

Best regards,

SeppeDeWinter commented 1 year ago

Hi @YangLi-Bio

Sorry to hear that downgrading did not work.

Just to be sure, running


import ray
ray.init(
  num_cpus = 5,
  _temp_dir = os.path.join(tmp_dir + 'ray_spill'))

does not cause any error?

Best,

Seppe

YangLi-Bio commented 1 year ago

Dear Seppe,

No. It did not cause any error. I have re-run the full script and found that another error occurs instead of this one as follows:

Finished Topic modeling
Finished model evaluation
Traceback (most recent call last):
  File "2-scenicplus-preprocessing.py", line 119, in <module>
    region_bin_topics_top3k = binarize_topics(cistopic_obj, method='ntop', ntop = 3000)
  File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/topic_binarization.py", line 121, in binarize_topics
    data.iloc[
  File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/indexing.py", line 1068, in __getitem__
    return self._getitem_tuple(key)
  File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/indexing.py", line 1564, in _getitem_tuple
    tup = self._validate_tuple_indexer(tup)
  File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/indexing.py", line 874, in _validate_tuple_indexer
    self._validate_key(k, i)
  File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/indexing.py", line 1467, in _validate_key
    self._validate_integer(key, axis)
  File "/users/PAS1475/liyang/.conda/envs/scenicplus/lib/python3.8/site-packages/pandas/core/indexing.py", line 1558, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

The codes are as follows:

print ("Finished model evaluation")

# Inferring candidate enhancer regions
from pycisTopic.topic_binarization import *
region_bin_topics_otsu = binarize_topics(cistopic_obj, method='otsu')
region_bin_topics_top3k = binarize_topics(cistopic_obj, method='ntop', ntop = 3000)
SeppeDeWinter commented 1 year ago

Hi @YangLi-Bio

Can you show:


cistopic_obj.selected_model.topic_region

Best,

Seppe

pchiang5 commented 1 year ago

I encountered the same error that could be bypassed by removing the last line _temp_dir from the input as below:

models=run_cgs_models(cistopic_obj,
                    n_topics=[2,4,10,16,32,48],
                    n_cpu=40,
                    n_iter=500,
                    random_state=555,
                    alpha=50,
                    alpha_by_topic=True,
                    eta=0.1,
                    eta_by_topic=False,
                    save_path=None)
                    # _temp_dir = os.path.join(tmp_dir + 'ray_spill')) #error with this