aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

An error related to "peak_calling" #343

Open m03077yhtnt opened 6 months ago

m03077yhtnt commented 6 months ago

Describe the bug When I attempt to run "peak_calling", I encounter an error, which I believe is related to numpy. I installed scenicplus using Python 3.11 following the instructions on the website: https://github.com/aertslab/scenicplus/tree/development. Based on my understanding, I am using "MACS2-2.2.7.1-py3.8-linux-x86_64". Could the error be caused by the difference in Python versions? I would appreciate any advice you can provide. Thank you so much for your help.

To Reproduce narrow_peaks_dict = peak_calling(macs_path, bed_paths, os.path.join(work_dir, 'scATAC/consensus_peak_calling/MACS/'), genome_size='hs', n_cpu=2, input_format='BEDPE', shift=73, ext_size=146, keep_dup = 'all', q_value = 0.05)

Version (please complete the following information):

Error output

2024-03-31 00:27:23,775 INFO worker.py:1724 -- Started a local Ray instance.
E0331 00:27:25.916030700    3019 socket_utils_common_posix.cc:224]     check for SO_REUSEPORT: UNKNOWN:Protocol not available {created_time:"2024-03-31T00:27:25.915182686+09:00", errno:92, os_error:"Protocol not available", syscall:"getsockopt(SO_REUSEPORT)"}
(macs_call_peak_ray pid=3242) 2024-03-31 00:27:30,145 cisTopic     INFO     Calling peaks for Non-PE with macs2 callpeak --treatment scATAC/consensus_peak_calling/pseudobulk_bed_files/Non-PE.fragments.tsv.gz --name Non-PE  --outdir scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
(macs_call_peak_ray pid=3241) 2024-03-31 00:27:30,150 cisTopic     INFO     Calling peaks for EMT1B with macs2 callpeak --treatment scATAC/consensus_peak_calling/pseudobulk_bed_files/EMT1B.fragments.tsv.gz --name EMT1B  --outdir scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
---------------------------------------------------------------------------
RayTaskError(RuntimeError)                Traceback (most recent call last)
Cell In[10], line 8
      6 macs_path='macs2'
      7 # Run peak calling
----> 8 narrow_peaks_dict = peak_calling(macs_path,
      9                                  bed_paths,
     10                                  os.path.join(work_dir, 'scATAC/consensus_peak_calling/MACS/'),
     11                                  genome_size='hs',
     12                                  n_cpu=2,
     13                                  input_format='BEDPE',
     14                                  shift=73,
     15                                  ext_size=146,
     16                                  keep_dup = 'all',
     17                                  q_value = 0.05)

File ~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:286, in peak_calling(macs_path, bed_paths, outdir, genome_size, n_cpu, input_format, shift, ext_size, keep_dup, q_value, nolambda, skip_empty_peaks, **kwargs)
    284     except Exception as e:
    285         ray.shutdown()
--> 286         raise(e)
    287     ray.shutdown()
    288 else:

File ~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:264, in peak_calling(macs_path, bed_paths, outdir, genome_size, n_cpu, input_format, shift, ext_size, keep_dup, q_value, nolambda, skip_empty_peaks, **kwargs)
    262 ray.init(num_cpus=n_cpu, **kwargs)
    263 try:
--> 264     narrow_peaks = ray.get(
    265         [
    266             macs_call_peak_ray.remote(
    267                 macs_path,
    268                 bed_paths[name],
    269                 name,
    270                 outdir,
    271                 genome_size,
    272                 input_format,
    273                 shift,
    274                 ext_size,
    275                 keep_dup,
    276                 q_value,
    277                 nolambda,
    278                 skip_empty_peaks
    279 
    280             )
    281             for name in list(bed_paths.keys())
    282         ]
    283     )
    284 except Exception as e:
    285     ray.shutdown()

File ~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/ray/_private/auto_init_hook.py:22, in wrap_auto_init.<locals>.auto_init_wrapper(*args, **kwargs)
     19 @wraps(fn)
     20 def auto_init_wrapper(*args, **kwargs):
     21     auto_init_ray()
---> 22     return fn(*args, **kwargs)

File ~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/ray/_private/client_mode_hook.py:103, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
    101     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    102         return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)

File ~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/ray/_private/worker.py:2624, in get(object_refs, timeout)
   2622     worker.core_worker.dump_object_store_memory_usage()
   2623 if isinstance(value, RayTaskError):
-> 2624     raise value.as_instanceof_cause()
   2625 else:
   2626     raise value

RayTaskError(RuntimeError): ray::macs_call_peak_ray() (pid=3241, ip=192.168.0.8)
  File "/home/m03077yh/anaconda3/envs/scenicplus/lib/python3.11/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/m03077yh/anaconda3/envs/scenicplus/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'macs2 callpeak --treatment scATAC/consensus_peak_calling/pseudobulk_bed_files/EMT1B.fragments.tsv.gz --name EMT1B  --outdir scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

ray::macs_call_peak_ray() (pid=3241, ip=192.168.0.8)
  File "/home/m03077yh/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 445, in macs_call_peak_ray
    MACS_peak_calling = MACSCallPeak(
                        ^^^^^^^^^^^^^
  File "/home/m03077yh/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 523, in __init__
    self.call_peak()
  File "/home/m03077yh/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 564, in call_peak
    raise RuntimeError(
RuntimeError: command 'macs2 callpeak --treatment scATAC/consensus_peak_calling/pseudobulk_bed_files/EMT1B.fragments.tsv.gz --name EMT1B  --outdir scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda' return with error (code 1): b'Traceback (most recent call last):\n  File "/home/m03077yh/.local/bin/macs2", line 4, in <module>\n    __import__(\'pkg_resources\').run_script(\'MACS2==2.2.7.1\', \'macs2\')\n  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 667, in run_script\n    self.require(requires)[0].run_script(script_name, ns)\n  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1463, in run_script\n    exec(code, namespace, namespace)\n  File "/home/m03077yh/.local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 653, in <module>\n    main()\n  File "/home/m03077yh/.local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 49, in main\n    from MACS2.callpeak_cmd import run\n  File "/home/m03077yh/.local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/MACS2/callpeak_cmd.py", line 23, in <module>\n    from MACS2.OptValidator import opt_validate\n  File "/home/m03077yh/.local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/MACS2/OptValidator.py", line 20, in <module>\n    from MACS2.IO.Parser import BEDParser, ELANDResultParser, ELANDMultiParser, \\\n  File "__init__.pxd", line 242, in init MACS2.IO.Parser\nValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject\n'
SeppeDeWinter commented 6 months ago

Hi @m03077yhtnt

Can you try again after reinstalling numpy


pip install --upgrade numpy --force-reinstall

Best, Seppe

cjiang310437 commented 2 months ago

Hi @m03077yhtnt

Can you try again after reinstalling numpy

pip install --upgrade numpy --force-reinstall

Best, Seppe

I have the same error. After upgrading numpy there are more errors related to package incompatibility. ---------------------------------------------------------------------------
WorkerCrashedError                        Traceback (most recent call last)
Cell In[27], line 6
      2 macs_path = "macs2"
      4 os.makedirs(os.path.join(out_dir, "consensus_peak_calling/MACS"), exist_ok = True)
----> 6 narrow_peak_dict = peak_calling(
      7     macs_path = macs_path,
      8     bed_paths = bed_paths,
      9     outdir = os.path.join(os.path.join(out_dir, "consensus_peak_calling/MACS")),
     10     genome_size = 'hs',
     11     n_cpu = 8,
     12     input_format = 'BEDPE',
     13     shift = 73,
     14     ext_size = 146,
     15     keep_dup = 'all',
     16     q_value = 0.05,
     17     _temp_dir = temp_dir
     18 )

File ~/.conda/envs/scenicplus3.11/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:286, in peak_calling(macs_path, bed_paths, outdir, genome_size, n_cpu, input_format, shift, ext_size, keep_dup, q_value, nolambda, skip_empty_peaks, **kwargs)
    284     except Exception as e:
    285         ray.shutdown()
--> 286         raise(e)
    287     ray.shutdown()
    288 else:

File ~/.conda/envs/scenicplus3.11/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:264, in peak_calling(macs_path, bed_paths, outdir, genome_size, n_cpu, input_format, shift, ext_size, keep_dup, q_value, nolambda, skip_empty_peaks, **kwargs)
    262 ray.init(num_cpus=n_cpu, **kwargs)
    263 try:
--> 264     narrow_peaks = ray.get(
    265         [
    266             macs_call_peak_ray.remote(
    267                 macs_path,
    268                 bed_paths[name],
    269                 name,
    270                 outdir,
    271                 genome_size,
    272                 input_format,
    273                 shift,
    274                 ext_size,
    275                 keep_dup,
    276                 q_value,
    277                 nolambda,
    278                 skip_empty_peaks
    279 
    280             )
    281             for name in list(bed_paths.keys())
    282         ]
    283     )
    284 except Exception as e:
    285     ray.shutdown()

File ~/.conda/envs/scenicplus3.11/lib/python3.11/site-packages/ray/_private/auto_init_hook.py:22, in wrap_auto_init.<locals>.auto_init_wrapper(*args, **kwargs)
     19 @wraps(fn)
     20 def auto_init_wrapper(*args, **kwargs):
     21     auto_init_ray()
---> 22     return fn(*args, **kwargs)

File ~/.conda/envs/scenicplus3.11/lib/python3.11/site-packages/ray/_private/client_mode_hook.py:103, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
    101     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    102         return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)

File ~/.conda/envs/scenicplus3.11/lib/python3.11/site-packages/ray/_private/worker.py:2626, in get(object_refs, timeout)
   2624             raise value.as_instanceof_cause()
   2625         else:
-> 2626             raise value
   2628 if is_individual_id:
   2629     values = values[0]

WorkerCrashedError: The worker died unexpectedly while executing this task. Check python-core-worker-*.log files for more information.
SeppeDeWinter commented 2 months ago

Hi @cjiang310437

Is it feasible to try to run this code with one core?

Best,

Seppe

cjiang310437 commented 2 months ago

Hi @cjiang310437

Is it feasible to try to run this code with one core?

Best,

Seppe

Hi @SeppeDeWinter, Thanks for replying. I tried upgrading numpy and one core but both did not work. It looks like it's not the issue of numpy or job parallels. Here is the situation: I have been running scenicplus on our HPC clusters. The code always worked fine. But recently our HPC clusters machines are upgraded from CentOS7 to RHEL9. And this code started to getting the following errors for 'undefined_symbol: log_finite' using the RHEL9 machines. The error occured both on peak_calling(macs2) and purturbation(velocyto) steps. Can I get any idea for resolving the issue?

The error got from peakcalling using scenicplus version downloaded from main branch: image

The error got from plot_perturbation_effect_in_embedding using scenicplus version downloaded from old branch: image

Any suggestions would be appreciated. Thank you!

Best, Cheng

SeppeDeWinter commented 1 month ago

Hi @cjiang310437

Did you try to reinstall all packages after the OS change? It might be that some packages need to be recompiled to the new os.

All the best,

Seppe