PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

Error in rule phasing_prepare #612

Closed jaavedm closed 10 months ago

jaavedm commented 10 months ago

Operating system Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-84-generic x86_64)

Package name IPA version 1.8.0

Conda environment

# packages in environment at /home/jaavedm/anaconda3/envs/ipa-py3.11:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
amply                     0.1.6              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.20.1               hd590300_0    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
certifi                   2023.7.22          pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
charset-normalizer        3.3.0              pyhd8ed1ab_0    conda-forge
coin-or-cbc               2.10.10              h9002f0b_0    conda-forge
coin-or-cgl               0.60.7               h516709c_0    conda-forge
coin-or-clp               1.17.8               h1ee7a9c_0    conda-forge
coin-or-osi               0.108.8              ha2443b9_0    conda-forge
coin-or-utils             2.11.9               hee58242_0    conda-forge
coincbc                   2.10.10           0_metapackage    conda-forge
configargparse            1.7                pyhd8ed1ab_0    conda-forge
connection_pool           0.0.3              pyhd3deb0d_0    conda-forge
datrie                    0.8.2           py311h459d7ec_7    conda-forge
docutils                  0.20.1          py311h38be061_2    conda-forge
dpath                     2.1.6              pyha770c72_0    conda-forge
gitdb                     4.0.10             pyhd8ed1ab_0    conda-forge
gitpython                 3.1.37             pyhd8ed1ab_0    conda-forge
htslib                    1.18                 h81da01d_0    bioconda
humanfriendly             10.0               pyhd8ed1ab_6    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib_resources       6.1.0              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jsonschema                4.19.1             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.7.1           pyhd8ed1ab_0    conda-forge
jupyter_core              5.4.0           py311h38be061_0    conda-forge
k8                        0.2.5                hdcf5f25_4    bioconda
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libblas                   3.9.0           18_linux64_openblas    conda-forge
libcblas                  3.9.0           18_linux64_openblas    conda-forge
libcurl                   8.4.0                hca28451_0    conda-forge
libdeflate                1.19                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_2    conda-forge
libgfortran5              13.2.0               ha4646dd_2    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
liblapack                 3.9.0           18_linux64_openblas    conda-forge
liblapacke                3.9.0           18_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.24          pthreads_h413a1c8_0    conda-forge
libsqlite                 3.43.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markupsafe                2.1.3           py311h459d7ec_1    conda-forge
minimap2                  2.26                 he4a0461_2    bioconda
nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
openssl                   3.1.3                hd590300_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pb-falconc                1.15.0               haabb649_2    bioconda
pbipa                     1.8.0                h6ead514_2    bioconda
pcre                      8.45                 h9c3ff4c_0    conda-forge
pip                       23.3               pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
plac                      1.4.0              pyhd8ed1ab_0    conda-forge
platformdirs              3.11.0             pyhd8ed1ab_0    conda-forge
psutil                    5.9.5           py311h459d7ec_1    conda-forge
pulp                      2.7.0           py311h38be061_1    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.1              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.6          hab00c5b_0_cpython    conda-forge
python-fastjsonschema     2.18.1             pyhd8ed1ab_0    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pyyaml                    6.0.1           py311h459d7ec_1    conda-forge
racon                     1.5.0                h21ec9f0_2    bioconda
readline                  8.2                  h8228510_1    conda-forge
referencing               0.30.2             pyhd8ed1ab_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
reretry                   0.11.8             pyhd8ed1ab_0    conda-forge
rpds-py                   0.10.6          py311h46250e7_0    conda-forge
samtools                  1.18                 h50ea8bc_1    bioconda
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
smart_open                6.4.0              pyhd8ed1ab_0    conda-forge
smmap                     3.0.5              pyh44b312d_0    conda-forge
snakemake-minimal         7.32.4             pyhdfd78af_0    bioconda
stopit                    1.1.2                      py_0    conda-forge
tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
throttler                 1.2.2              pyhd8ed1ab_0    conda-forge
tk                        8.6.13               h2797004_0    conda-forge
toposort                  1.10               pyhd8ed1ab_0    conda-forge
traitlets                 5.11.2             pyhd8ed1ab_0    conda-forge
typing-extensions         4.8.0                hd8ed1ab_0    conda-forge
typing_extensions         4.8.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
urllib3                   2.0.6              pyhd8ed1ab_0    conda-forge
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0          py311h459d7ec_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yte                       1.5.1              pyha770c72_2    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Describe the bug When running a small sample dataset in distributed mode, IPA fails in phasing_prepare block of code. This issue is identical to that reported in bug report https://github.com/PacificBiosciences/pbbioconda/issues/496 However, the workaround presented in https://github.com/PacificBiosciences/pbbioconda/issues/496 does not work for me because whenever I try to upgrade IPA from 1.5 to 1.8, snakemake is also upgraded in the process.

When run in local mode, IPA runs to completion without error.

Error message

Submitted job 15 with external jobid 'Your job 1491 ("ipa_small") has been submitted'.
[Sun Oct 15 16:40:48 2023]
Error in rule phasing_prepare:
    jobid: 15
    input: 02-build_db/reads.seqdb, 05-ovl_asym_merge/ovl.nonlocal.m4, 01-generate_config/generated.config
    output: 06-phasing_prepare/shards, 06-phasing_prepare/shards/pwd.txt
    shell:

        sharddir=$(dirname 06-phasing_prepare/shards/pwd.txt)
        rm -rf $sharddir
        mkdir -p $sharddir
        cd $sharddir
        rel=../..

        input_m4="$rel/05-ovl_asym_merge/ovl.nonlocal.m4"         output_shard_ids=./all_shard_ids         output_pwd=./pwd.txt         params_config_sh_fn="$rel/01-generate_config/generated.config"         params_max_nchunks="40"         params_log_level="INFO"         params_tmp_dir="./"             time ipa2-task phasing_prepare

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Your job 1491 ("ipa_small") has been submitted

Error executing rule phasing_prepare on cluster (jobid: 15, external: Your job 1491 ("ipa_small") has been submitted, jobscript: /work/jaavedm/pacbio_test/small/.snakemake/tmp.iic14pki/snakejob.phasing_prepare.15.sh). For error details see the cluster log and the log files of the involved rule(s).
Cleanup job metadata.
Cleanup failed jobs output files.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-10-15T164008.089634.snakemake.log
unlocking
removing lock
removing lock
removed all locks

When I check the SGE error logs for this task that fails, I find the following output:

Building DAG of jobs...
MissingInputException in rule ovl_asym_merge in file /home/jaavedm/anaconda3/envs/ipa-py3.11/etc/ipa.snakefile, line 230:
Missing input files for rule ovl_asym_merge:
    output: 05-ovl_asym_merge/ovl.merged.m4, 05-ovl_asym_merge/ovl.nonlocal.m4
    affected files:
        04-ovl_asym_run/0/ovl.sorted.m4
CWD:/work/jaavedm/pacbio_test/small
NPROC:16
NPROC_SERIAL:16
config:{'advanced_options': '', 'coverage': 0, 'genome_size': 0, 'm4filt_high_copy_sample_rate': 1.0, 'max_nchunks': 40, 'nproc': 16, 'phase_run': 1, 'polish_run': 1, 'purge_dups_calcuts': '', 'purge_dups_run': 1, 'reads_fn': 'small/input.fofn', 'tmp_dir': './'}

The file ovl.sorted.m4 does not exist. When I list the tree structure of directory 04-ovl_asym_run, I find:

(ipa-py3.11) jaavedm@badger:/work/jaavedm/pacbio_test/small/04-ovl_asym_run$ tree -R .
.
└── 0
    ├── log.ovl_asym_run.pancake.s0.b0_0_1.memtime
    └── log.ovl_asym_run.sort.memtime

1 directory, 2 files

However, directory 05-ovl_asym_merge already has the output files ovl.merged.m4 and ovl.nonlocal.m4

(ipa-py3.11) jaavedm@badger:/work/jaavedm/pacbio_test/small/05-ovl_asym_merge$ ll
total 63276
drwxr-xr-x 2 jaavedm jaavedm     4096 Oct 15 16:40 ./
drwxrwxr-x 9 jaavedm jaavedm     4096 Oct 15 16:40 ../
-rw-r--r-- 1 jaavedm jaavedm      138 Oct 15 16:40 log.ovl_asym_merge.awk_nonlocals.memtime
-rw-r--r-- 1 jaavedm jaavedm      162 Oct 15 16:40 log.ovl_asym_merge.mergesort.memtime
-rw-r--r-- 1 jaavedm jaavedm       14 Oct 15 16:40 ovl.merged.fofn
-rw-r--r-- 1 jaavedm jaavedm 32528480 Oct 15 16:40 ovl.merged.m4
-rw-r--r-- 1 jaavedm jaavedm 32239524 Oct 15 16:40 ovl.nonlocal.m4
-rw-r--r-- 1 jaavedm jaavedm       35 Oct 15 16:40 sorted.fofn

To Reproduce

  1. Download and install IPA

    conda create -n ipa-py3.11 python=3.11
    conda activate ipa-py3.11
    conda install -c bioconda pbipa
  2. Download a small sample dataset from Hifiasm wget https://github.com/chhylp123/hifiasm/releases/download/v0.7/chr11-2M.fa.gz

  3. Run IPA in local mode. time ipa local --nthreads 8 --njobs 4 -i chr11-2M.fa.gz Program runs to completion without error.

  4. Run IPA in distributed mode. time ipa dist -i chr11-2M.fa.gz --run-dir small/ --cluster-args 'qsub -v PATH -S /bin/bash -N ipa_small -cwd -j y -pe smp {params.num_threads} -e qsub_log/ -o qsub_log/ -V' --nthreads 16 --njobs 7 --tmp-dir "./" --verbose IPA fails

Expected behavior Program should have identical behavior if run as "local" or as "dist"

armintoepfer commented 10 months ago

If the current solution isn't effective for your situation, there's nothing we can do at the moment. However, we may consider incorporating it into a future release.