Closed masen1991 closed 3 years ago
The error comes from this line:
DETECT_BATCH_LIST.append(int(os.path.basename(line).rstrip('.bam')))
SMRT-SV expects the BAM files to have an integer name (batch name). Create a symlink named 0.bam
that points to your BAM file, then place the path to that symlink in "align/alignments.fofn".
I'm offering best-effort support on SMRT-SV right now. I think there are better variant callers now, and I hardly even use it myself, though it may still be useful to get supporting calls for a better callset. Let me know if you have other problems though.
Hello @paudano ,thanks for the quick reply. i try what you said,and there is a little wrong ,i need to symlink 0.bam in align/bam/0.bam or it will go wrong. thanks again ,and i am running on detect step now,hope it can go well
hello @paudano when i run smrtsv detect,it still go wrong
Error in rule detect_gaps_search: jobid: 26 output: detect/gaps/batch/0_all.bed log: detect/gaps/batch/log/0.log
RuleException: CalledProcessError in line 489 of /picb/humpopg-bigdata5/masen/bio/smrtsv2/rules/detect.snakefile: Command ' set -euo pipefail; samtools view -F 0x4 -q 30 align/bam/0.bam | python2 -s /picb/humpopg-bigdata5/masen/bio/smrtsv2/scripts/PrintGaps.py reference/ref.fasta /dev/stdin --condense 20 >detect/gaps/batch/0_all.bed 2>detect/gaps/batch/log/0.log ' returned non-zero exit status 1. File "/picb/humpopg-bigdata5/masen/bio/smrtsv2/rules/detect.snakefile", line 489, in __rule_detect_gaps_search File "/picb/humpopg-bigdata5/masen/bio/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run Removing output files of failed job detect_gaps_search since they might be corrupted: detect/gaps/batch/0_all.bed Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message
any idea?
Look it: detect/gaps/batch/log/0.log
The error that caused the rule to fail should appear there.
Hello @paudano
Traceback (most recent call last):
File "/picb/humpopg-bigdata5/masen/bio/smrtsv2/scripts/PrintGaps.py", line 449, in
I think the contig alignments are against a different reference than what you are pointing SMRT-SV to. "chr1_KI270706v1_random" must be in the BAM but not in the reference FASTA or FAI.
My apologies for the delayed response.
Hello,
I'm also trying to use smrtsv2 on an aligned bam file (NA24385 PacBio CCS data aligned against hg19 reference using NGMLR) with the method from #41, however, it reports an error in the assemble_polish step. The error was from smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py
and according to the error message, it failed because the aligned bam file does not have a suitable header (current failed on the @ RG) .
According to BamIO.py:
def _loadReadGroupInfo(self):
rgs = self.peer.header["RG"]
readGroupTable_ = []
# RGID -> ("abstract feature name" -> actual feature name)
self._baseFeatureNameMappings = {}
self._pulseFeatureNameMappings = {}
for rg in rgs:
rgID = rgAsInt(rg["ID"])
rgName = rg["PU"]
ds = dict([pair.split("=") for pair in rg["DS"].split(";") if pair != ""])
# spec: we only consider first two components of basecaller version
# in "chem" lookup
rgReadType = ds["READTYPE"]
rgChem = "unknown"
rgFrameRate = 0.0
rgSample = rg.get("SM", "UnnamedSample")
if rgReadType != "TRANSCRIPT":
if set(("BASECALLERVERSION", "SEQUENCINGKIT", "BINDINGKIT")) <= set(ds):
pass
else:
raise IOError("This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program.")
rgFrameRate = ds["FRAMERATEHZ"]
basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2])
triple = ds["BINDINGKIT"], ds["SEQUENCINGKIT"], basecallerVersion
rgChem = decodeTriple(*triple)
# Look for the features manifest entries within the DS tag,
# and build an "indirection layer", i.e. to get from
# "Ipd" to "Ipd:Frames"
# (This is a bit messy. Can we separate the manifest from
# the rest of the DS content?)
baseFeatureNameMapping = { key.split(":")[0] : key
for key in ds.keys()
if key in BASE_FEATURE_TAGS }
pulseFeatureNameMapping = { key.split(":")[0] : key
for key in ds.keys()
if key in PULSE_FEATURE_TAGS }
self._baseFeatureNameMappings[rgID] = baseFeatureNameMapping
self._pulseFeatureNameMappings[rgID] = pulseFeatureNameMapping
readGroupTable_.append(
(rgID, rgName, rgReadType, rgChem, rgFrameRate, rgSample,
frozenset(baseFeatureNameMapping.iterkeys())))
it required many field that wasn't provided in the aligned bam file, and as mentioned in #24, smrtsv2 need the PacBio subread bam files for the assembly polish step. Does that mean smrtsv2 can not run on the bam file which is not in PacBio BAM format? I'm kind of confused because some datasets only provide the fasta/fastq file, and I'm not able to get the RG information from the aligners (e.g. the bam header from minimap2 only have HD, SQ and PG). Is there any way to solve this problem?
Thank you
hello , how can i use smrtsv2 on a aligned bam? i see https://github.com/EichlerLab/smrtsv2/issues/41 ,and i do this: i try to vim a align/alignments.fofn in dir (which have folder as align/ reference/),and i already try smrtsv2 index reference. in align/alignments.fofn,i only have this line: /abs_path/all_20K_40K_combine_merged_nglmr_b38.sorted.bam and when i try smrtsv detect or smrtsv run it all get this : Detecting variant signatures ValueError in line 31 of ~/bio/smrtsv2/rules/detect.snakefile: invalid literal for int() with base 10: 'all_20K_40K_combine_merged_nglmr_b38.sorted' File "~/bio/smrtsv2/rules/detect.snakefile", line 31, in
can you help me.
Thanks.