databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

Issue with "psutil.NoSuchProcess process no longer exists" #195

Closed lmckinno closed 3 years ago

lmckinno commented 3 years ago

Hello, I am having trouble with a "psutil.NoSuchProcess process no longer exists" error in the bamSitesToWig.py script (log attached). Have you come across this issue before? Any ideas for debugging? PEPATAC_log.txt

nsheff commented 3 years ago

The actual issue is this:

Reduce step (merge files)...
Merging 1039 files into output file: '/varidata/research/projects/triche/primary/GROP_20210120_ATAC/pepatac_processed/processed/results_pipeline/K72B/aligned_GRCh38_with_decoys_viruses_and_spikes_exact/K72B_exact.bw'
Merging 1039 files into output file: '/varidata/research/projects/triche/primary/GROP_20210120_ATAC/pepatac_processed/processed/results_pipeline/K72B/aligned_GRCh38_with_decoys_viruses_and_spikes/K72B_smooth.bw'
Traceback (most recent call last):
  File "/varidata/research/projects/triche/primary/GROP_20210120_ATAC/pepatac_processed/pepatac/tools/bamSitesToWig.py", line 389, in <module>
    ct.combine(good_chromosomes)
  File "/varidata/research/projects/triche/primary/GROP_20210120_ATAC/pepatac_processed/pepatac/tools/bamSitesToWig.py", line 312, in combine
    p = subprocess.call(cmd, shell=True)
  File "/primary/vari/software/BBC/python3/build-3.8.1-2.bz.sqlite3/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/primary/vari/software/BBC/python3/build-3.8.1-2.bz.sqlite3/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/primary/vari/software/BBC/python3/build-3.8.1-2.bz.sqlite3/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/bin/sh'

I have not run into this before. Your reference genome assembly appears to have >1000 sequences ("chromosomes"), is that correct? I believe that is the cause of this. I think I never tried running pepatac on such a large reference before. We should figure this out but I'm afraid it's not going to be a simple fix from our side

In the meantime, a workaround would be to use a different reference assembly if that's possible. Or, you can split your decoy sequences into pre-alignments (which would actually likely be more efficient and more accurate).