PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
251 stars 44 forks source link

pbmm2 ERROR: Output BAM file cannot be used for polishing with GenomicConsensus! #462

Closed Mjaraespejo closed 2 years ago

Mjaraespejo commented 2 years ago

Operating system Ubuntu Package name pbmm2 Conda environment

# packages in environment at /drives/ssd1/manuel/miniconda3/envs/hifi_denovo_asm:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
bedtools                  2.30.0               h7d7f7ad_2    bioconda
blasr                     5.3.5                         0    bioconda
brotli                    1.0.9                h7f98852_6    conda-forge
brotli-bin                1.0.9                h7f98852_6    conda-forge
bwa                       0.7.17               h5bf99c6_8    bioconda
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
certifi                   2021.10.8        py38h578d9bd_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
falcon-kit                1.8.1                    pypi_0    pypi
falcon-phase              1.2.0                    pypi_0    pypi
falcon-unzip              1.3.7                    pypi_0    pypi
fonttools                 4.28.5           py38h497a2fe_0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
future                    0.18.2           py38h578d9bd_4    conda-forge
htslib                    1.10.2               hd3b49d5_1    bioconda
jpeg                      9d                   h36c2ea0_0    conda-forge
k8                        0.2.5                h9a82719_1    bioconda
kiwisolver                1.3.2            py38h1fd1430_1    conda-forge
krb5                      1.19.2               hcc1bbae_3    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libblas                   3.9.0           12_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h7f98852_6    conda-forge
libbrotlidec              1.0.9                h7f98852_6    conda-forge
libbrotlienc              1.0.9                h7f98852_6    conda-forge
libcblas                  3.9.0           12_linux64_openblas    conda-forge
libcurl                   7.81.0               h2574ce0_0    conda-forge
libdeflate                1.6                  h516909a_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgfortran-ng            11.2.0              h69a702a_11    conda-forge
libgfortran5              11.2.0              h5c6108e_11    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
liblapack                 3.9.0           12_linux64_openblas    conda-forge
libnghttp2                1.43.0               h812cca2_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libtiff                   4.0.10            hc3755c2_1005    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
matplotlib-base           3.5.1            py38hf4fb855_0    conda-forge
minimap2                  2.24                 h5bf99c6_0    bioconda
mummer4                   4.0.0rc1        pl5262h1b792b2_1    bioconda
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
networkx                  2.6.3              pyhd8ed1ab_1    conda-forge
nim-falcon                3.0.2                h18d090a_1    bioconda
numpy                     1.22.0           py38h6ae9a64_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.3.5            py38h43a58ef_0    conda-forge
pb-assembly               0.0.8                hdfd78af_1    bioconda
pb-dazzler                0.0.1                h779adbc_1    bioconda
pb-falcon                 2.2.4            py38h1bd3507_1    bioconda
pb-falcon-phase           0.1.0                h8e334b0_1    bioconda
pbgcpp                    2.0.2                h9ee0642_1    bioconda
pbmm2                     1.4.0                h56fc30b_0    bioconda
pcre                      8.45                 h9c3ff4c_0    conda-forge
perl                      5.26.2            h36c2ea0_1008    conda-forge
pillow                    6.2.1            py38h6b7be26_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.6              pyhd8ed1ab_0    conda-forge
pysam                     0.16.0.1         py38hbdc2ae9_1    bioconda
python                    3.8.12          hb7a2778_2_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-edlib              1.3.9            py38h8c62d01_0    bioconda
python-intervaltree       3.1.0              pyh864c0ab_0    bioconda
python-msgpack            0.6.1            py38h8c62d01_4    bioconda
python-sortedcontainers   2.4.0              pyh5e36f6f_0    bioconda
python_abi                3.8                      2_cp38    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
racon                     1.4.20               h9a82719_1    bioconda
readline                  8.1                  h46c0cb4_0    conda-forge
samtools                  1.10                 h2e538c0_3    bioconda
scipy                     1.7.3            py38h56a6a73_0    conda-forge
setuptools                60.5.0           py38h578d9bd_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.37.0               h9cd32fc_0    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
unicodedata2              14.0.0           py38h497a2fe_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.4.9                ha95c52a_0    conda-forge

Describe the bug I cannot run the first polishing step (pbmm2 alignment) after genome assembly using Falcon. I am using only HiFi reads. Error message This is my command:

pbmm2 align  --preset CCS --sort -j 8 ../2-asm-falcon/p_ctg.fasta /drives/raid/AboobakerLab/data/genomes/phaw/hifi_pbasm/m64164_211030_095407.hifi.fastq hifiAsm.bam

Error:

|> 20220112 16:14:08.793 -|- WARN -|- CheckPositionalArgs -|- 0x7efebb91e740|| -|- Input is FASTQ. Output BAM file cannot be used for polishing with GenomicConsensus!

I obtained the HiFi reads fasta file (used for assembly) and fastq (used durin pbmm2 alignment) using the samtools fasta/fastq tools. From PacBio I only received two hifi files :

To Reproduce This is fc_run_HiFi.cfg file:

[job.defaults]
njobs = 100
submit = qsub -S /bin/bash -sync y -V  \
  -q ${JOB_QUEUE}     \
  -N ${JOB_NAME}        \     
  -o "${JOB_STDOUT}" \
  -e "${JOB_STDERR}" \
  -pe smp ${NPROC}    \
  "${JOB_SCRIPT}"

JOB_QUEUE = myqueue
MB = 48000
NPROC = 6
[General]
input_type = preads
input_fofn = ccs.fasta.fofn
#pwatcher_type=blocking

# not relevant for HiFi but include anyway
# The length cutoff used for seed reads used for initial mapping
genome_size = 400000000
seed_coverage = 40
length_cutoff = -1

# not relevant for HiFi but include anyway
# overlapping options for Daligner
pa_daligner_option = -e0.8 -l1000 -k18 -h70 -w8 -s100
pa_HPCdaligner_option = -v -B128 -M24
pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100
pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100
#pa_REPmask_code=1,20;10,15;50,10
pa_DBsplit_option = -x500 -s400
# error correction consensus option
falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 3 --max-n-read 100 --n-core 4

# Parameters relevant to HiFi/CCS assembly.
length_cutoff_pr = 10000
ovlp_daligner_option = -k24 -h1024 -e.99 -l1000 -s100
ovlp_HPCdaligner_option = -v -B128 -M24    
ovlp_DBsplit_option = -s200

# experimenent with "--min-idt" to collapse (98-99) or split haplotypes (up to 99.9) during contig assembly
# if you plan to unzip, collapse first using ~98, lower for very divergent haplotypes
# ignore indels looks at only substitutions in overlaps, allows higher overlap stringency to reduce repeat-induced errors
overlap_filtering_setting = --max-diff 400 --max-cov 400 --min-cov 2 --n-core 24 --min-idt 99.9 --ignore-indels

[job.step.da]
NPROC=4
MB=32000
njobs=20
[job.step.la]
NPROC=4
MB=16384
njobs=30
[job.step.cns]
NPROC=4
MB=64000
njobs=20
[job.step.pda]
NPROC=4
MB=32768
njobs=15
[job.step.pla]
NPROC=4
MB=16384
njobs=20
[job.step.asm]
NPROC=24
MB=192000
njobs=1

Expected behavior To get a bam file after alignment in order to proceed with my new assembly Racon polishing.

pb-dseifert commented 2 years ago
  1. You're using m64164_211030_095407.hifi.fastq, which is a FASTQ, and can't be used with GCpp (only BAM as an unaligned input).
  2. It looks like you're trying to polish with CCS reads, which is not supported. You can only use subreads for polishing with GCpp.
Mjaraespejo commented 2 years ago

Thanks for your reply. I am still a bit confused.

If you can not use fastq data, what these commands (from https://github.com/PacificBiosciences/pbbioconda/wiki/Assembling-HiFi-data:-FALCON-Unzip3) mean?

pbmm2 align --preset CCS --sort -j {N} {ref.fasta} {reads.fastq} {aln.bam}
samtools view -F 1796 -q 20 {aln.bam} > {aln.sam}
racon -u -t {N} {reads.fastq} {aln.sam} {ref.fasta} > {polished.ref.fasta}

In your documentation these are the commands that should be used after running falcon assembly using HiFi data. The input in this case is fastq .

Thanks.

pb-dseifert commented 2 years ago

that mentions racon and not gcpp. racon can consume aligned HiFi data, gcpp can't.

pb-dseifert commented 2 years ago

that said, Falcon is pretty much EOL at this point, and it's unlikely you'll get any support for it. IPA is our next-generation assembler.

Mjaraespejo commented 2 years ago

I was trying to run pbmm2, no gcpp, using fastq data. I did not even mention gcpp. But anyway, thanks.