PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
243 stars 44 forks source link

ipa fails at 18-purge_dups #666

Closed tpshea2 closed 1 month ago

tpshea2 commented 4 months ago

Operating system: RHEL 7.8

Package name: ipa (wrapper) version=1.8.0

Conda environment:

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
amply                     0.1.6                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
brotli                    1.1.0                    pypi_0    pypi
brotli-python             1.1.0           py312h30efb56_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.22.1               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
charset-normalizer        3.3.2                    pypi_0    pypi
coin-or-cbc               2.10.10              h9002f0b_0    conda-forge
coin-or-cgl               0.60.7               h516709c_0    conda-forge
coin-or-clp               1.17.8               h1ee7a9c_0    conda-forge
coin-or-osi               0.108.8              ha2443b9_0    conda-forge
coin-or-utils             2.11.9               hee58242_0    conda-forge
coincbc                   2.10.10           0_metapackage    conda-forge
configargparse            1.7                      pypi_0    pypi
connection-pool           0.0.3                    pypi_0    pypi
connection_pool           0.0.3              pyhd3deb0d_0    conda-forge
datrie                    0.8.2                    pypi_0    pypi
docutils                  0.20.1                   pypi_0    pypi
dpath                     2.1.6                    pypi_0    pypi
fastjsonschema            2.19.0                   pypi_0    pypi
gitdb                     4.0.11                   pypi_0    pypi
gitpython                 3.1.40                   pypi_0    pypi
htslib                    1.18                 h81da01d_0    bioconda
humanfriendly             10.0                     pypi_0    pypi
idna                      3.6                      pypi_0    pypi
importlib-resources       6.1.1                    pypi_0    pypi
importlib_resources       6.1.1              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2                    pypi_0    pypi
jsonschema                4.20.0                   pypi_0    pypi
jsonschema-specifications 2023.11.1                pypi_0    pypi
jupyter-core              5.5.0                    pypi_0    pypi
jupyter_core              5.5.0           py312h7900ff3_0    conda-forge
k8                        0.2.5                hdcf5f25_4    bioconda
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libcblas                  3.9.0           20_linux64_openblas    conda-forge
libcurl                   8.4.0                hca28451_0    conda-forge
libdeflate                1.19                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
liblapack                 3.9.0           20_linux64_openblas    conda-forge
liblapacke                3.9.0           20_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markupsafe                2.1.3                    pypi_0    pypi
minimap2                  2.26                 he4a0461_2    bioconda
nbformat                  5.9.2                    pypi_0    pypi
ncurses                   6.4                  h59595ed_2    conda-forge
networkx                  3.2.1                    pypi_0    pypi
openssl                   3.2.1                hd590300_0    conda-forge
packaging                 23.2                     pypi_0    pypi
pb-falconc                1.15.0               haabb649_2    bioconda
pbipa                     1.8.0                h6ead514_2    bioconda
pcre                      8.45                 h9c3ff4c_0    conda-forge
pip                       23.3.1                   pypi_0    pypi
pkgutil-resolve-name      1.3.10                   pypi_0    pypi
plac                      1.4.1                    pypi_0    pypi
platformdirs              4.0.0                    pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
pulp                      2.7.0                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
pysocks                   1.7.1                    pypi_0    pypi
python                    3.12.0          hab00c5b_0_cpython    conda-forge
python-fastjsonschema     2.19.0             pyhd8ed1ab_0    conda-forge
python_abi                3.12                    4_cp312    conda-forge
pyyaml                    6.0.1                    pypi_0    pypi
racon                     1.5.0                h21ec9f0_2    bioconda
readline                  8.2                  h8228510_1    conda-forge
referencing               0.31.0                   pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
reretry                   0.11.8                   pypi_0    pypi
rpds-py                   0.13.1                   pypi_0    pypi
samtools                  1.18                 h50ea8bc_1    bioconda
setuptools                68.2.2                   pypi_0    pypi
smart-open                6.4.0                    pypi_0    pypi
smart_open                6.4.0              pyhd8ed1ab_0    conda-forge
smmap                     5.0.0                    pypi_0    pypi
snakemake                 7.32.4                   pypi_0    pypi
snakemake-minimal         7.32.4             pyhdfd78af_0    bioconda
stopit                    1.1.2                    pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
throttler                 1.2.2                    pypi_0    pypi
time                      1.8                  h516909a_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toposort                  1.10                     pypi_0    pypi
traitlets                 5.14.0                   pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
typing_extensions         4.8.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
urllib3                   2.1.0                    pypi_0    pypi
wheel                     0.42.0                   pypi_0    pypi
wrapt                     1.16.0                   pypi_0    pypi
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yte                       1.5.1                    pypi_0    pypi
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

bug: ipa fails on an assembly of 12 hifi reads (12 small plasmid reads which were unmapped to the original ipa assembly contigs) with a core dump at step 18-purge_dups where it writes out the following (including file size)

   2904 Mar 21 12:44 PB.stat
    160 Mar 21 12:44 PB.base.cov
    264 Mar 21 12:44 PB.cov.wig
    153 Mar 21 12:44 log.purge_dups.ipa_purge_dups_pbcstat.memtime
      0 Mar 21 12:44 cutoffs
      0 Mar 21 12:44 calcuts.log
 409600 Mar 21 12:44 core.129065
    120 Mar 21 12:44 log.purge_dups.ipa_purge_dups_calcuts.memtime

ipa command fails when I use the following advanced options: --advanced-opt "config_block_size = 100; config_ovl_filter_opt = --max-diff 80 --max-cov 100 --min-cov 2 --bestn 10 --min-len 500 --gapFilt --minDepth 4 --idt-stage2 98; config_ovl_min_len = 500; config_seeddb_opt = -k 28 -w 20 --space 0 --use-hpc-seeds-only; config_ovl_opt = --one-hit-per-target --min-idt 98 --min-map-len 500 --min-anchor-span 500 --traceback --mask-hp --mask-repeats --trim --trim-window-size 30 --trim-match-frac 0.75 --smart-hit-per-target --secondary-min-ovl-frac 0.05; config_layout_opt = --allow-circular;"

These are the two entries (and lengths) in the 14-separate/p_ctg.fasta file:

ctg/p/c/000000/0    2757
ctg/p/c/000001/0    2651

Error message:

Error in rule purge_dups_paf:
    jobid: 2
    input: 17-purge_dups_map_merge/merged.paf.gz, 14-separate/p_ctg.fasta, 14-separate/a_ctg.fasta, 01-generate_config/generated.config
    output: 18-purge_dups/final_purged_primary.fasta, 18-purge_dups/final_purged_haplotigs.fasta
    shell:

        wd=$(dirname 18-purge_dups/final_purged_primary.fasta)
        mkdir -p $wd
        cd $wd
        rel=..

        input_paf="$rel/17-purge_dups_map_merge/merged.paf.gz"         input_primary_fasta="$rel/14-separate/p_ctg.fasta"         input_haplotigs_fasta="$rel/14-separate/a_ctg.fasta"         params_num_threads="4"         params_log_level="INFO"         params_config_sh_fn="$rel/01-generate_config/generated.config"         output_primary_fasta=$(basename 18-purge_dups/final_purged_primary.fasta)         output_haplotigs_fasta=$(basename 18-purge_dups/final_purged_haplotigs.fasta)             time ipa2-task purge_dups_paf

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

In case it is helpful to see how the input reads align to the 14-separate/p_ctg.fasta file:

    [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  | [TAGS]
=====================================================================================
       1      928  |     1724     2651  |      928      928  |   100.00  | m84143_231019_220554_s1/121771968/ccs    ctg/p/c/000001/0
     929     2639  |        1     1712  |     1711     1712  |    99.94  | m84143_231019_220554_s1/121771968/ccs    ctg/p/c/000001/0
       1     2757  |        2     2757  |     2757     2756  |    99.96  | m84143_231019_220554_s1/126423256/ccs    ctg/p/c/000000/0
       1     1778  |      982     2757  |     1778     1776  |    99.89  | m84143_231019_220554_s1/218764900/ccs    ctg/p/c/000000/0
    1779     2757  |        1      978  |      979      978  |    99.90  | m84143_231019_220554_s1/218764900/ccs    ctg/p/c/000000/0
       1     1498  |     1498        1  |     1498     1498  |   100.00  | m84143_231019_220554_s1/235275032/ccs    ctg/p/c/000000/0
    1499     2755  |     2757     1501  |     1257     1257  |   100.00  | m84143_231019_220554_s1/235275032/ccs    ctg/p/c/000000/0
       1      941  |      940        1  |      941      940  |    99.89  | m84143_231019_220554_s1/240654034/ccs    ctg/p/c/000001/0
     942     2626  |     2651      967  |     1685     1685  |   100.00  | m84143_231019_220554_s1/240654034/ccs    ctg/p/c/000001/0
       1     1176  |     1582     2757  |     1176     1176  |   100.00  | m84143_231019_220554_s1/260118115/ccs    ctg/p/c/000000/0
    1177     2755  |        1     1579  |     1579     1579  |   100.00  | m84143_231019_220554_s1/260118115/ccs    ctg/p/c/000000/0
       1     2263  |      389     2651  |     2263     2263  |   100.00  | m84143_231019_220554_s1/39981363/ccs ctg/p/c/000001/0
       1     1701  |      953     2651  |     1701     1699  |    99.76  | m84143_231019_220554_s1/50597833/ccs ctg/p/c/000001/0
    1702     4353  |        1     2651  |     2652     2651  |    99.62  | m84143_231019_220554_s1/50597833/ccs ctg/p/c/000001/0
    4354     5076  |        1      722  |      723      722  |    99.31  | m84143_231019_220554_s1/50597833/ccs ctg/p/c/000001/0

It looks like there are two small (~2.7 Kb) plasmids .

Thank you for any insight you might have on how to get this assembly to complete.

armintoepfer commented 1 month ago

We are in the process of deprecating IPA. Please use alternative HiFi genome assemblers, such as https://github.com/chhylp123/hifiasm