PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

ipdSummary -- KeyError: "tag 'ip' not present" #95

Closed ck-theory closed 1 year ago

ck-theory commented 1 year ago

Hello, can you please help with the following error I have hit while running ipdSummary from smrtlink_11.0.0.146107 on data from a Sequel IIe demultiplexed microbial run with HiFi reads and Kinetics enabled? {SAMPLE and RUNID} have been redacted to protect customer data.

SMRTLink 11.0 Command Log

Methylation from HiFi-ASM assembled data

1) Index the fasta file

dataset create --generateIndices fasta.referenceset.xml {SAMPLE}.HiFiASM.assembly.fasta

2) Align the assembly to the PB data

pbmm2 align fasta.referenceset.xml {RUN_ID}.bc2014--bc2014.consensusreadset.xml --preset HiFi out.consensusalignmentset.xml

3) Call the methylation profiler

export SMRT_CHEMISTRY_BUNDLE_DIR=/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/userdata/chemistry/chemistry-pb-active

ipdSummary out.bam --reference {SAMPLE}.HiFiASM.assembly.fasta --gff out.gff --csv out.csv --bigwig out.bigwig



[INFO] 2022-11-15 19:16:28,519Z [kineticsTools.ipdSummary _pacbio_main_runner 160] Using pbcommand v2.3.2
[INFO] 2022-11-15 19:16:28,519Z [kineticsTools.ipdSummary _pacbio_main_runner 161] completed setting up logger with <function setup_log at 0x7fc6dea1ab80>
[INFO] 2022-11-15 19:16:28,520Z [kineticsTools.ipdSummary _pacbio_main_runner 164] log opts {'level': 20, 'file_name': None}
[INFO] 2022-11-15 19:16:28,521Z [kineticsTools.loader getResourcePathSpec 59] found SMRT_CHEMISTRY_BUNDLE_DIR, prepending to default paramsPath
[INFO] 2022-11-15 19:16:28,522Z [root loadSharedAlignmentSet 475] Reading AlignmentSet: out.bam
[INFO] 2022-11-15 19:16:28,522Z [root loadSharedAlignmentSet 476]            reference: /data2/projects/Methylation_Testing_PB/{SAMPLE}.HiFiASM.assembly.fasta
[INFO] 2022-11-15 19:16:28,638Z [kineticsTools.loader getIpdModelFilename 42] Using chemistry-matched kinetics model: '/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/resources/SP3-C3.npz.gz'
[INFO] 2022-11-15 19:16:28,638Z [root loadReferenceAndModel 463] Loading reference contigs '/data2/projects/Methylation_Testing_PB/{SAMPLE}.HiFiASM.assembly.fasta'
[INFO] 2022-11-15 19:16:30,349Z [root _launchSlaveProcesses 410] Available CPUs: 128
[INFO] 2022-11-15 19:16:30,350Z [root _launchSlaveProcesses 411] Requested worker processes: 1
[INFO] 2022-11-15 19:16:30,358Z [root _launchSlaveProcesses 437] Launched worker processes.
[INFO] 2022-11-15 19:16:30,360Z [root _run 90] Worker KineticWorkerProcess-1 (PID=1838874) started running
[INFO] 2022-11-15 19:16:30,363Z [root _launchSlaveProcesses 443] Launched result collector process.
[INFO] 2022-11-15 19:16:30,364Z [root _mainLoop 540] Generating kinetics summary for [out.bam]
[INFO] 2022-11-15 19:16:30,364Z [root _mainLoop 555] Processing window/contig: ReferenceWindow(refId=0, refName='ptg000001l', start=0, end=962482)
[INFO] 2022-11-15 19:16:30,364Z [kineticsTools.ResultWriter _run 40] Process KineticsWriter-2 (PID=1838875) started running
[INFO] 2022-11-15 19:16:30,373Z [root _run 125] Got chunk: (0, ReferenceWindow(refId=0, refName='ptg000001l', start=0, end=1000)) -- Process: <KineticWorkerProcess name='KineticWorkerProcess-1' parent=1838820 started daemon>
[INFO] 2022-11-15 19:16:30,376Z [root _summarizeReferenceRegion 226] Making summary: -15 to 1015
[INFO] 2022-11-15 19:16:30,483Z [root _fetchChunks 426] Retrieved 11 hits
Process KineticWorkerProcess-1:
Traceback (most recent call last):
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/WorkerProcess.py", line 151, in run
    self._run()
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/WorkerProcess.py", line 127, in _run
    result = self.onChunk(  # pylint: disable=assignment-from-none
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/KineticWorker.py", line 121, in onChunk
    perSiteResults = self._summarizeReferenceRegion(
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/KineticWorker.py", line 230, in _summarizeReferenceRegion
    (caseChunks, capValue) = self._fetchChunks(
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/KineticWorker.py", line 449, in _fetchChunks
    rawIpds = self._loadRawIpds(hits, start, end, factor)
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/kineticsTools/KineticWorker.py", line 484, in _loadRawIpds
    rawIpd = aln.IPD() * factor
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/pbcore/io/align/BamAlignment.py", line 49, in f
    return self.baseFeature(featureName, aligned, orientation)
  File "/data/programs/miniconda3/envs/smrtlink_11.0.0.146107/smrttools_install/install/smrtlink-release_11.0.0.146107/bundles/smrttools/install/smrttools-release_11.0.0.146107/private/thirdparty/python3/python3_3.9.6/site-packages/pbcore/io/align/BamAlignment.py", line 524, in baseFeature
    data_ = self.peer.opt(tag)
  File "pysam/libcalignedsegment.pyx", line 2770, in pysam.libcalignedsegment.AlignedSegment.opt
  File "pysam/libcalignedsegment.pyx", line 2438, in pysam.libcalignedsegment.AlignedSegment.get_tag
KeyError: "tag 'ip' not present"
Child process exited with exitcode=1.  Aborting.
[ERROR] 2022-11-15 19:16:31,366Z [root monitorChildProcesses 592] Child process exited with exitcode=1.  Aborting.```
ck-theory commented 1 year ago

For anyone with the same issue - this key error is because the reads used in the pbmm2 align step are HiFi reads with kinetics, not subreads. So when they are aligned, there is important data missing. The two ways around this are to either re-generate subreads on a previously demultiplexed run or to run ccs-kinetics-bystrandify on your HiFi reads to reconstitute the subread data. With the help of the PB support team, below is the current workflow to avoid re-running lima:

Modification to include conversion of HiFi data to pseudo subread data

conda install -c bioconda pbtk


inputs:

reference.fasta reference.fasta.fai inreads.bam (dmuxed hifi reads)

conda activate pbbam-2.1.0 ccs-kinetics-bystrandify inreads.bam out.kinetics.bam conda deactivate

conda activate smrtlink_11.0.0.146107 pbvalidate out.kinetics.bam pbmm2 align --sort out.kinetics.bam reference.fasta out.alignment.bam pbvalidate out.alignment.bam pbindex out.alignment.bam ipdSummary -j 20 out.alignment.bam --reference reference.fasta --gff out.gff --csv out.csv --bigwig out.bigwig

kafker commented 1 year ago

For anyone with the same issue - this key error is because the reads used in the pbmm2 align step are HiFi reads with kinetics, not subreads. So when they are aligned, there is important data missing. The two ways around this are to either re-generate subreads on a previously demultiplexed run or to run ccs-kinetics-bystrandify on your HiFi reads to reconstitute the subread data. With the help of the PB support team, below is the current workflow to avoid re-running lima:

Modification to include conversion of HiFi data to pseudo subread data

conda install -c bioconda pbtk

inputs:

reference.fasta

reference.fasta.fai inreads.bam (dmuxed hifi reads) conda activate pbbam-2.1.0 ccs-kinetics-bystrandify inreads.bam out.kinetics.bam conda deactivate

conda activate smrtlink_11.0.0.146107 pbvalidate out.kinetics.bam pbmm2 align --sort out.kinetics.bam reference.fasta out.alignment.bam pbvalidate out.alignment.bam pbindex out.alignment.bam ipdSummary -j 20 out.alignment.bam --reference reference.fasta --gff out.gff --csv out.csv --bigwig out.bigwig

Hi ck-theory

How did you install the smrtlink v11 with conda? I only see a v10 installation: https://anaconda.org/hcc/smrtlink-tools

Thank you