fanglab / SMALR

SMALR: a framework for single-molecule level interrogation of the methylation status of SMRT reads.
Other
8 stars 6 forks source link

Stuck with cmp.h5 of short read library #13

Closed yfuruta closed 6 years ago

yfuruta commented 6 years ago

Hi,

Sorry for posting an issue again. I have successfully installed SMALR and got expected results with test scripts on Mac OSX High Sierra, but it always stuck at the first alignment processing step when I ran with cmp.h5 of short read with ~200 bp subread length. Both SMsn and SMp works when I use cmp.h5 of long read with ~5 kb subread length.

Followings are the output when I terminated the stuck process with ctrl-C. It always shows the Traceback information reagardless of the running time. The cmp.h5 was produced by pbalign in SMRT Portal with options of '--scoreCutoff 240 --minAccuracy 80 --minLength 100'. File produced without the option didn't improved the stuck.

It would be great if I could get any hint for resolving the problem.

Many thanks, Yoshi

(smalr_venv) abl-users-imac-3:c4a Yoshi$ smalr -d --SMsn --motif=GAANNNNNNNNTAG --mod_pos=3 --procs=4 -c 5 input_c4a_SMsn_custom.txt 
Preparing to iterate over all contigs in /Users/Yoshi/informatics/PacBio/nonspecific/rawdata/c4a_short_ER2796_custumoption_data_pacbio_smrtanalysis_userdata_jobs_016_016820_data_aligned_reads.cmp.h5
    - ref000001 (CP009644.1 Escherichia coli ER2796, complete genome)
13:50:19 [INFO] 
13:50:19 [INFO] ====================================
13:50:19 [INFO] Analyzing contig ref000001 (CP009644.1 Escherichia coli ER2796, complete genome)
13:50:19 [INFO] ====================================
13:50:19 [INFO] ref000001 - contig_id:               ref000001
13:50:19 [INFO] ref000001 - contig_name:             CP009644.1 Escherichia coli ER2796, complete genome
13:50:20 [DEBUG] Creating tasks...
13:50:20 [DEBUG] Done.
13:50:20 [DEBUG] Starting consumers...
13:50:20 [DEBUG] Done.
13:50:20 [INFO] Partitioning /Users/Yoshi/informatics/PacBio/nonspecific/rawdata/pZE2M_short_ER2796_customoption_data_pacbio_smrtanalysis_userdata_jobs_016_016821_data_aligned_reads.cmp.h5 into 4 chunks for analysis...
13:51:14 [DEBUG] Process 0: reading motif sites...
13:51:14 [DEBUG] Process 1: reading motif sites...
13:51:14 [DEBUG] Process 2: reading motif sites...
13:51:14 [DEBUG] Process 3: reading motif sites...
13:51:14 [DEBUG] ref000001 - Consumer-1: Starting
13:51:14 [DEBUG] ref000001 - Consumer-2: Starting
13:51:14 [DEBUG] ref000001 - Consumer-3: Starting
13:51:14 [DEBUG] ref000001 - Consumer-4: Starting
13:52:24 [INFO] ...chunk 0 - 0/8292 (0.0%) alignments processed...
13:52:24 [INFO] ...chunk 2 - 0/8515 (0.0%) alignments processed...
13:52:24 [INFO] ...chunk 3 - 0/8411 (0.0%) alignments processed...
13:52:24 [INFO] ...chunk 1 - 0/8311 (0.0%) alignments processed...
^CTraceback (most recent call last):
  File "/Users/Yoshi/informatics/software/smalr_venv/bin/smalr", line 11, in <module>
    load_entry_point('smalr==1.1', 'console_scripts', 'smalr')()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr_multicontig.py", line 88, in main
    sys.exit( app.run() )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr_multicontig.py", line 83, in run
    runner.run()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr.py", line 360, in run
    chunk_ipdArrays = self.launch_parallel_molecule_loading( self.Config.wga_cmph5, prefix, wga_movie_name_ID_map, control_ipds )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr.py", line 333, in launch_parallel_molecule_loading
    tasks.join()
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 342, in join
    self._cond.wait()
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/synchronize.py", line 246, in wait
    self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt
Process Consumer-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr.py", line 87, in run
    answer = next_task()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 307, in __call__
    mol = molecule( alignments, self.prefix, self.leftAnchor, self.rightAnchor, self.contig_id, self.sites_pos, self.sites_neg, self.cmph5, self.opts )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 42, in __init__
    self.load_entries( alignments, sites_pos, sites_neg )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 75, in load_entries
    fps             = get_fps(self.cmph5)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 57, in get_fps
    reader    = CmpH5Reader(align_fn)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 729, in __init__
    self._loadAlignmentInfo(sharedIndex)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 745, in _loadAlignmentInfo
    rawAlignmentIndex = self.file["/AlnInfo/AlnIndex"].value
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 250, in value
    return self[()]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 496, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
KeyboardInterrupt
Process Consumer-3:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr.py", line 87, in run
    answer = next_task()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 307, in __call__
    mol = molecule( alignments, self.prefix, self.leftAnchor, self.rightAnchor, self.contig_id, self.sites_pos, self.sites_neg, self.cmph5, self.opts )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 42, in __init__
    self.load_entries( alignments, sites_pos, sites_neg )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 75, in load_entries
    fps             = get_fps(self.cmph5)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 57, in get_fps
    reader    = CmpH5Reader(align_fn)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 729, in __init__
    self._loadAlignmentInfo(sharedIndex)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 745, in _loadAlignmentInfo
    rawAlignmentIndex = self.file["/AlnInfo/AlnIndex"].value
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 250, in value
    return self[()]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 496, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
KeyboardInterrupt
Process Consumer-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr.py", line 87, in run
    answer = next_task()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 307, in __call__
    mol = molecule( alignments, self.prefix, self.leftAnchor, self.rightAnchor, self.contig_id, self.sites_pos, self.sites_neg, self.cmph5, self.opts )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 42, in __init__
    self.load_entries( alignments, sites_pos, sites_neg )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 75, in load_entries
    fps             = get_fps(self.cmph5)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 57, in get_fps
    reader    = CmpH5Reader(align_fn)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 729, in __init__
    self._loadAlignmentInfo(sharedIndex)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 745, in _loadAlignmentInfo
    rawAlignmentIndex = self.file["/AlnInfo/AlnIndex"].value
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 250, in value
    return self[()]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 496, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
KeyboardInterrupt
Process Consumer-4:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/smalr.py", line 87, in run
    answer = next_task()
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 307, in __call__
    mol = molecule( alignments, self.prefix, self.leftAnchor, self.rightAnchor, self.contig_id, self.sites_pos, self.sites_neg, self.cmph5, self.opts )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 42, in __init__
    self.load_entries( alignments, sites_pos, sites_neg )
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 75, in load_entries
    fps             = get_fps(self.cmph5)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/smalr-1.1-py2.7.egg/smalr/parse_mol_aligns.py", line 57, in get_fps
    reader    = CmpH5Reader(align_fn)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 729, in __init__
    self._loadAlignmentInfo(sharedIndex)
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/pbcore-1.2.10-py2.7.egg/pbcore/io/align/CmpH5IO.py", line 745, in _loadAlignmentInfo
    rawAlignmentIndex = self.file["/AlnInfo/AlnIndex"].value
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 250, in value
    return self[()]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/Yoshi/informatics/software/smalr_venv/lib/python2.7/site-packages/h5py-2.7.1-py2.7-macosx-10.11-x86_64.egg/h5py/_hl/dataset.py", line 496, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
KeyboardInterrupt
jbeaulaurier commented 6 years ago

Hi Yoshi,

I actually suspect that it's not stuck, but rather is just processing a lot of alignments in each process. This could take some time to complete. Using more processors (if available) using --procs will speed up this step.

To see if smalr is indeed still running at this point in the pipeline, I would just using top to check to see whether there are indeed multiple processes running. Let me know what you find.

Best, John

yfuruta commented 6 years ago

Hi John,

Thanks for the comment. I retried with increased # of process and found indications of 25% progress after 3 days. It was just the problem of my patience.

Thank you for your help!

Best, Yoshi

jbeaulaurier commented 6 years ago

Hi Yoshi,

Sorry it's taking so long for you! I should add a flag to specify a fixed number of reads to use. This would be useful for checking methylation status of a subsample of reads from a very large dataset (where querying all reads might be unnecessary).

Best, John