PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

isoseq collapse error: Could not find length of CCS read #648

Closed Hyeyeonggg closed 6 months ago

Hyeyeonggg commented 7 months ago

Operating system

CentOS Linux 8

Package name

isoseq 4.0.0 conda 23.11.0 conda update --all results # All requested packages already installed.

Conda environment What is the result of conda list?

# packages in environment at /genomics/tools/longread/miniconda3/envs/isoseq:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
bzip2                     1.0.8                h7b6447c_0
c-ares                    1.19.1               h5eee18b_0
ca-certificates           2023.12.12           h06a4308_0
htslib                    1.13                 h9093b5e_0    bioconda
isoseq                    4.0.0                h9ee0642_0    bioconda
krb5                      1.20.1               h568e23c_1
libcurl                   7.88.1               h91b91d3_2
libdeflate                1.17                 h5eee18b_1
libedit                   3.1.20230828         h5eee18b_0
libev                     4.33                 h7f8727e_1
libgcc-ng                 11.2.0               h1234567_1
libgomp                   11.2.0               h1234567_1
libnghttp2                1.52.0               ha637b67_1
libssh2                   1.10.0               h37d81fd_2
libstdcxx-ng              11.2.0               h1234567_1
lima                      2.9.0                h9ee0642_1    bioconda
ncurses                   6.4                  h6a678d5_0
openssl                   1.1.1w               h7f8727e_0
pbbam                     1.7.0                h058f120_1    bioconda
pbmm2                     1.13.1               h9ee0642_0    bioconda
pbpigeon                  1.0.0                hdfd78af_0    bioconda
xz                        5.2.10               h5eee18b_1
zlib                      1.2.13               h5eee18b_0

Describe the bug

My data is segmented.bam file from the MAS-Seq bulk Iso-Seq & Revio platform. (HiFi reads that have segmented using skera)

I processed this data as follows:

lima segmented.bam IsoSeq_v2_primers_12.fasta fl.bam --isoseq --peek-guess --overwrite-biosample-names
isoseq refine fl.IsoSeqX_${barcode}_5p--IsoSeqX_3p.bam IsoSeq_v2_primers_12.fasta flnc.bam --require-polya
isoseq cluster2 flnc.bam clustered.bam
pbmm2 align --preset ISOSEQ --sort clustered.bam hg38.mmi mapped.bam
isoseq collapse --do-not-collpase-extra-5exons mapped.bam clustered.bam collpase.gff --log-file collapse.log

Error from this command isoseq collapse --do-not-collapse-extra-5exons mapped.bam clustered.bam collapse.gff --log-file collapse.log

I've processed other standard Iso-seq data & single-cell MAS-ISOseq data that didn't have any error in the collpase step. I have no idea what part in the bam file has length of CCS read information.

Error message

|> 20240206 23:16:19.973 -|- FATAL -|- operator() -|- 0x7f12a26dcd40|| -|- Could not find length of CCS read m84172_240116_125246_s3/63967683/ccs/9075_13738 |> 20240206 23:16:20.451 -|- FATAL -|- Run -|- 0x7f12a26dcd40|| -|- isoseq collapse ERROR: std::exception

Hyeyeonggg commented 7 months ago

I omitted clustered.bam which is the optional input in isoseq collapse command. This made the command complete. But I want to know the resolution of the above issue with clustered.bam file !

Thank you.

Hyeyeonggg commented 6 months ago

I should use flnc.bam from isoseq refine instead of clustered.bam from isoseq cluster2.