Closed RSherman15 closed 5 years ago
When you run SMRT-SV, are you running the the smrtsv
shell script, or smrtsv.py
? The shell script sets up the environment for SMRT-SV and runs the pipeline. If you are running smrtsv.py
directly, it will use your external environment.
Originally, README.md
had this wrong, but I have since fixed that.
Let me know if that resolves this issue.
I am using smrtsv.py
-- hadn't noticed that changed in the README. However, I just tried running with smrtsv
and got the same error.
I see. It’s finding user installed packages in your home directory. Looks like python -s
will do ignore those. I will need to make this change to every python script call in the pipeline.
Made changes so SMRT-SV run python2
and python3
with -s
to ignore site-packages
in home directories.
Thanks for reporting this.
The fix you made solved this for the detect step, but I'm now seeing the same issue in the assemble step. Is it possible there's a python call you missed in the latest commit (I couldn't find one, but maybe I missed something)?
Here's the piece of the log file with the error in the assemble stage:
[Thu Mar 7 02:33:56 2019]
rule assemble_polish:
input: region/6-32740000-60000/asm/contigs.fasta, region/6-32740000-60000/asm/contigs.fasta.fai, region/6-32740000-60000/asm/contig_aligned_reads.bam, region/6-32740000-60000/asm/contig_aligned_reads.bam.pbi
output: region/6-32740000-60000/asm/contigs_polished.fasta
jobid: 471
wildcards: region_id=6-32740000-60000
resources: threads=24
/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/site-packages/snakemake/__init__.py:1559: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
export_cwl=args.export_cwl)
ESC[33mJob counts:
count jobs
1 assemble_polish
1ESC[0m
Traceback (most recent call last):
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/bin/variantCaller", line 3, in <module>
from GenomicConsensus.main import main
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 10, in <module>
import pysam
File "/home-4/rsherma8@jhu.edu/.local/lib/python2.7/site-packages/pysam/__init__.py", line 5, in <module>
from pysam.libchtslib import *
ImportError: /home-4/rsherma8@jhu.edu/.local/lib/python2.7/site-packages/pysam/libchtslib.so: undefined symbol: PyUnicodeUCS2_FromStringAndSize
I was calling variantCaller
directly, which is the PacBio program that polishes assemblies (it is the quiver and arrow polishing step). I needed to make a change so that it explicitly calls python from within the PacBio environment.
To run this, first cd into dep
(in the SMRT-SV install directory) and run:
ln -sf ../conda/build/envs/pacbio/bin/python2 bin/pb_python2
Note: The build system will do this for new installs, I am just having you run it so you don't have to rebuild dep
.
After you make make that link, pull down the latest commit and run it. Let me know how that turns out.
This caused a different issue instead:
CalledProcessError in line 250 of /home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/assemble_group.snakefile:
Command ' set -euo pipefail; pb_python2 -s variantCaller --referenceFilename region/18-11720000-60000/asm/contigs.fasta region/18-11720000-60000/asm/contig_aligned_reads.bam -o region/18-11720000-60000/asm/contigs_polished.fasta -j 24 --algorithm=arrow; ' returned non-zero exit status 2.
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/assemble_group.snakefile", line 250, in __rule_assemble_polish
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in runESC[0m
ESC[31mExiting because a job execution failed. Look above for error message
I pushed a change. Can you try it? I wasn't calling variantCaller correctly.
Still an error:
count jobs
1 assemble_polish
1ESC[0m
'FRAMERATEHZ'
Traceback (most recent call last):
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
return_code = exe_main_func(*args, **kwargs)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner
return tr.main()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 258, in main
with AlignmentSet(options.inputFilename) as peekFile:
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2746, in __init__
super(AlignmentSet, self).__init__(*files, **kwargs)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1994, in __init__
super(ReadSet, self).__init__(*files, **kwargs)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 461, in __init__
self.updateCounts()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2574, in updateCount
s
self.assertIndexed()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2404, in assertIndex
ed
self._assertIndexed((IndexedBamReader, CmpH5Reader))
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1951, in _assertInde
xed
self._openFiles()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2075, in _openFiles
resource = IndexedBamReader(location)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 374, in __init__
super(IndexedBamReader, self).__init__(fname, referenceFastaFname)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 183, in __init__
self._loadReadGroupInfo()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 95, in _loadReadGroupInfo
rgFrameRate = ds["FRAMERATEHZ"]
KeyError: 'FRAMERATEHZ'
[ERROR] 'FRAMERATEHZ'
Traceback (most recent call last):
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
return_code = exe_main_func(*args, **kwargs)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner
return tr.main()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 258, in main
with AlignmentSet(options.inputFilename) as peekFile:
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2746, in __init__
super(AlignmentSet, self).__init__(*files, **kwargs)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1994, in __init__
super(ReadSet, self).__init__(*files, **kwargs)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 461, in __init__
self.updateCounts()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2574, in updateCounts
self.assertIndexed()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2404, in assertIndexed
self._assertIndexed((IndexedBamReader, CmpH5Reader))
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1951, in _assertIndexed
self._openFiles()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2075, in _openFiles
resource = IndexedBamReader(location)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 374, in __init__
super(IndexedBamReader, self).__init__(fname, referenceFastaFname)
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 183, in __init__
self._loadReadGroupInfo()
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 95, in _loadReadGroupInfo
rgFrameRate = ds["FRAMERATEHZ"]
KeyError: 'FRAMERATEHZ'
ESC[32m[Fri Mar 8 03:37:59 2019]ESC[0m
ESC[31mError in rule assemble_polish:ESC[0m
ESC[31m jobid: 0ESC[0m
ESC[31m output: region/14-74700000-60000/asm/contigs_polished.fastaESC[0m
ESC[31mESC[0m
ESC[31mRuleException:
CalledProcessError in line 250 of /home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/assemble_group.snakefile:
Command ' set -euo pipefail; pb_python2 -s /home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/bin/variantCaller --referenceFilename region/14-74700000-60000/asm/contigs.fasta region/14-74700000-60000/asm/contig_aligned_reads.bam -o region/14-74700000-60000/asm/contigs_polished.fasta -j 24 --algorithm=arrow; ' returned non-zero exit status 2.
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/assemble_group.snakefile", line 250, in __rule_assemble_polish
File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in runESC[0m
ESC[31mExiting because a job execution failed. Look above for error messageESC[0m
genomicConsensus
can be really frustrating. At least it's running, and I don't see any sign of site-packages from your home directory in the output, so that fix seems to have worked.
Questions:
bax2bam
)?I found two issues with this error: https://github.com/PacificBiosciences/Bioinformatics-Training/issues/42 https://github.com/PacificBiosciences/kineticsTools/issues/58
From the first issue, it looks like running from BAM files might be necessary. The second one was fixed by pbindex, but the pipeline already does that.
I'm using RS II reads, in bax.h5 format for the HG002 Genome in a Bottle data (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_MtSinai_NIST/hdf5).
Are you saying I'll need to rerun the whole pipeline with bams?
First try running with --asm-polish
for the run
or assemble
command. See if that makes genomicConsensus happy. If not, then you might need to run from BAM files to appease it. If that doesn't work, then I am out of ideas. It might be a question for PacBio.
Does running with --asm-polish quiver
make any difference?
No, it didn't help -- same error. I still need to try re-running with bams.
We are going to try replicating this problem with the same data. I'll let you know if we can get it running from BAMs. If so, we will modify the pipeline to fail on BAX input.
Any update on this?
Hi @RSherman15. I have taken over this issue, and am currently attempting to replicate your error. I am currently in the detect step, and will hope to have an answer in the coming days.
Assemblies now seem to be completing, after re-running the whole thing on subread.bam
files from bax2bam --allowUnrecognizedChemistryTriple
. I had to use --asm-polish quiver
, it still crashed with Arrow. It is still in the assemble step, so not sure it will make it to the end and through the genotyping step (which is my real goal) but for now this appears solved. It did waste a lot of time and compute resources to re-run from scratch though; I'd suggest clarifying in the README which starting file formats are allowed for which chemistries.
We had no idea that there would be such a difference running the same data from BAX or from BAM files. In any case, it requires subreads BAM files now and rejects everything else.
I'm getting the following error on the detect step, which seems to be due to the use of my machine's installed python rather than smrtsv2s conda environments. However, I'm not sure how to force it to use the correct python install, if that is indeed the issue.