PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
21 stars 21 forks source link

ipdSummary for SEQUELII(S/P5-C2/5.0-8M) #102

Closed jmenendez98 closed 3 weeks ago

jmenendez98 commented 1 month ago

Hi all!

I am trying to run ipdSummary to try and get a look at the ipdRatio at positions across my reference genome. However I am having an issue getting my test file to run. Can someone please help me figure out how to get it running correctly?

This is my command to launch ipdSummary:

ipdSummary \
    --reference assembly.v1.0.PAN011.diploid.fa \
    --numWorkers 32 \
    --outfile "PAN011.m64136_210506_183715.hifi_reads_ipdSummary" \
    PAN011.m64136_210506_183715.hifi_reads.pbmm2.bam

This is the error I am getting from this:

+ ipdSummary --reference assembly.v1.0.PAN027.diploid.fa --numWorkers -1 --outfile PAN011.m64136_210506_183715.hifi_reads.ipdSummary PAN011.m64136_210506_183715.hifi_reads.pbmm2.bam
2024-10-20 17:28:55,397 [WARNING] pbi file missing for /data/PAN011.m64136_210506_183715.hifi_reads.pbmm2.bam, operating with reduced speed and functionality
2024-10-20 17:28:55,417 [WARNING] Problem opening reference withIndexedFastaReader
2024-10-20 17:28:55,428 [WARNING] pbi file missing for /data/PAN011.m64136_210506_183715.hifi_reads.pbmm2.bam, operating with reduced speed and functionality
Chemistry cannot be identified---cannot perform kinetic analysis
2024-10-20 17:28:55,442 [ERROR] Chemistry cannot be identified---cannot perform kinetic analysis
Chemistry cannot be identified---cannot perform kinetic analysis
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
    return_code = exe_main_func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 699, in args_runner
    return kt.start()
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 412, in start
    return self.run()
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 475, in run
    ret = self._mainLoop()
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 629, in _mainLoop
    self.args.paramsPath)
  File "/usr/lib/python2.7/dist-packages/kineticsTools/internal/basic.py", line 26, in getIpdModelFilename
    raise Exception(msg)
Exception: Chemistry cannot be identified---cannot perform kinetic analysis
2024-10-20 17:28:55,442 [ERROR] Chemistry cannot be identified---cannot perform kinetic analysis
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
    return_code = exe_main_func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 699, in args_runner
    return kt.start()
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 412, in start
    return self.run()
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 475, in run
    ret = self._mainLoop()
  File "/usr/lib/python2.7/dist-packages/kineticsTools/ipdSummary.py", line 629, in _mainLoop
    self.args.paramsPath)
  File "/usr/lib/python2.7/dist-packages/kineticsTools/internal/basic.py", line 26, in getIpdModelFilename
    raise Exception(msg)
Exception: Chemistry cannot be identified---cannot perform kinetic analysis

After some digging I found these ipdSummary profiles, but I don't think any of them are applicable...

biodocker@ce9a7e5a3218:/data$ ls /usr/lib/python2.7/dist-packages/kineticsTools/resources/
C2.h5  P4-C2.h5  P5-C3.h5  P6-C4.h5  SP2-C2.h5  XL-C2.h5  XL-XL.h5  unknown.h5

Here is the header of my BAM file if that is any help:

@RG     ID:30ff792e     PL:PACBIO       DS:READTYPE=CCS;Ipd:CodecV1=ip;PulseWidth:CodecV1=pw;BINDINGKIT=101-894-200;SEQUENCINGKIT=101-826-100;BASECALLERVERSION=5.0.0;FRAMERATEHZ=100.000000   LB:MGISTL_PAN011_Fraction1_Fraction2_480cIT     PU:m64136_210506_183715 SM:MGISTL_PAN011_Fraction1_Fraction2_480cIT     PM:SEQUELII     CM:S/P5-C2/5.0-8M
@PG     ID:ccs-6.0.0    PN:ccs  VN:6.0.0        DS:Generate circular consensus sequences (ccs) from subreads.   CL:ccs /gscuser/smrtlink/smrtlink/install/smrtlink-release_10.1.0.115913/bundles/smrttools/install/smrttools-release_10.1.0.115913/private/pacbio/unanimity/binwrap/../../../../private/pacbio/unanimity/bin/ccs /gscmnt/gc13036/production/smrtlink_jobs_root/cromwell-executions/pb_ccs/cf57555d-b1fb-4dd1-8115-10804a201003/call-ccs1/shard-0/inputs/-149106723/a295d29b-8195-426e-88d1-d0db4a9ce9ce.subreadset.xml out.consensusreadset.xml --log-level INFO --chunk 1/24 --all --all-kinetics --subread-fallback --minLength 10 --maxLength 50000 --minPasses 0 --minSnr 2.5 --minPredictedAccuracy 0.0 --alarms alarms.json --task-report task-report.json --report-json ccs_processing.report.json --zmw-metrics-json ccs_zmws.json.gz -j 8
@PG     ID:pbmerge-1.6.1        PN:pbmerge      VN:1.6.1
@PG     ID:pbmm2        PN:pbmm2        VN:1.14.99 (commit v1.13.1-7-g864413e)  CL:pbmm2 align --preset CCS --sort assembly.v1.0.PAN011.diploid.fa PAN011.m64136_210506_183715.hifi_reads.bam PAN011.m64136_210506_183715.hifi_reads.pbmm2.bam

I am using the docker container: biocontainers/kineticstools:v0.6.1git20180425.27a1878-2-deb_cv1. Perhaps there is a better environment/install method to run this tool?

Please let me know if there are any resources to help find profiles for the chemistry I am using, or if anyone has had success overcoming this error!

Thanks!

jmenendez98 commented 3 weeks ago

I was able to resolve my issues by dowloading all software from [here].(https://downloads.pacbcloud.com/public/software/installers/smrtlink-release-sequel2_13.1.0.221970.zip)

As well as following the workflow steps from Issue #95.