aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

issue with run_scenicplus wrapper #88

Closed aitortxo10 closed 1 year ago

aitortxo10 commented 1 year ago

Hi!

First of all, thank you developers for bringing forth such an amazing tool. Unfortunately, I am facing some issues to generate the eGRN. I have followed PBMC the tutorial you provided to generate the network based on our data, but I am facing the following issue:

2023-01-11 13:06:53,252 cisTopic     INFO     Imputing drop-outs
2023-01-11 13:07:19,519 cisTopic     INFO     Scaling
2023-01-11 13:07:32,027 cisTopic     INFO     Keep non zero rows
2023-01-11 13:07:46,300 cisTopic     INFO     Imputed accessibility sparsity: 0.651854611289606
2023-01-11 13:07:46,300 cisTopic     INFO     Create CistopicImputedFeatures object
2023-01-11 13:07:46,300 cisTopic     INFO     Done!
2023-01-11 13:08:02,700 SCENIC+_wrapper INFO     /mnt/beegfs/agonzalez/scenicplus/eGRNs_pancreas folder already exists.
2023-01-11 13:08:02,700 SCENIC+_wrapper INFO     Merging cistromes
2023-01-11 13:09:50,129 SCENIC+_wrapper INFO     Getting search space
2023-01-11 13:09:52,333 R2G          INFO     Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2023-01-11 13:10:06,366 R2G          INFO     Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2023-01-11 13:10:13,077 R2G          INFO     Extending promoter annotation to 10 bp upstream and 10 downstream
2023-01-11 13:10:15,387 R2G          INFO     Extending search space to:
                                                        150000 bp downstream of the end of the gene.
                                                        150000 bp upstream of the start of the gene.
2023-01-11 13:10:37,239 R2G          INFO     Intersecting with regions.
2023-01-11 13:10:38,302 R2G          INFO     Calculating distances from region to gene
2023-01-11 13:13:22,175 R2G          INFO     Imploding multiple entries per region and gene
2023-01-11 13:17:58,431 R2G          INFO     Done!
2023-01-11 13:17:59,161 SCENIC+_wrapper INFO     Inferring region to gene relationships
2023-01-11 12:26:58,348 R2G          INFO     Calculating region to gene importances, using GBM method
join: Strand data from other will be added as strand data to self.
If this is undesired use the flag apply_strand_suffix=False.
To turn off the warning set apply_strand_suffix to True or False.
Traceback (most recent call last):
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/node.py", line 310, in __init__
    ray._private.services.wait_for_node(
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/services.py", line 434, in wait_for_node
    raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "generate_networks.py", line 109, in <module>
    raise(e)
  File "generate_networks.py", line 89, in <module>
    run_scenicplus(
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py", line 142, in run_scenicplus
    calculate_regions_to_genes_relationships(scplus_obj,
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/enhancer_to_gene.py", line 654, in calculate_regions_to_genes_relationships
    region_to_gene_importances = _score_regions_to_genes(SCENICPLUS_obj,
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/enhancer_to_gene.py", line 527, in _score_regions_to_genes
    ray.init(num_cpus=ray_n_cpu, **kwargs)
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/worker.py", line 1439, in init
    _global_node = ray._private.node.Node(
  File "/mnt/beegfs/agonzalez/.conda/envs/scenicplus/lib/python3.8/site-packages/ray/_private/node.py", line 315, in __init__
    raise Exception(
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup.

I am currently running the script on an HPC cluster through slurm. I also tried different approaches such as the singularity image you provided as well as a conda environment solely for SCENIC+, but this error keeps repeating. Unfortunately, I really do not know much about ray, so one of my main concers was if the origin of this problem is due to any kind of ray configuration not been the adequate or our data (since we analyzed the RNA data through Seurat and then converted the object to python).

Thank you for your efforts and happy new year!

cbravo93 commented 1 year ago

Hi @aitortxo10 !

Can you post the ray logs? They should be in _temp_dir, in a folder named with the date and time (eg:session_2022-08-04_09-22-46_157266_27177) and/or session_latest if it is the last thing ran/running.

Cheers!

C

aitortxo10 commented 1 year ago

Hi!

Sorry for the late response, we were testing a couple of possible solutions with the IT technicians and I did not notice your response. We managed to solve the issue. which was a mix between a lack of permissions in the cluster, so the server that ray tried to initiate was not allowed, and a lack of space for ray to write the spill. After we solved these issues we managed to run the wrapper without problems.

Once again thank you for your time and sorry for the late response, Aitor