Error in rule phasing_prepare #612

Closed jaavedm closed 10 months ago

jaavedm commented 10 months ago

Operating system Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-84-generic x86_64)

Package name IPA version 1.8.0

Describe the bug When running a small sample dataset in distributed mode, IPA fails in phasing_prepare block of code. This issue is identical to that reported in bug report However, the workaround presented in does not work for me because whenever I try to upgrade IPA from 1.5 to 1.8, snakemake is also upgraded in the process.

When run in local mode, IPA runs to completion without error.

Error message

Submitted job 15 with external jobid 'Your job 1491 ("ipa_small") has been submitted'.
[Sun Oct 15 16:40:48 2023]
Error in rule phasing_prepare:
    jobid: 15
    input: 02-build_db/reads.seqdb, 05-ovl_asym_merge/ovl.nonlocal.m4, 01-generate_config/generated.config
    output: 06-phasing_prepare/shards, 06-phasing_prepare/shards/pwd.txt

        sharddir=$(dirname 06-phasing_prepare/shards/pwd.txt)
        rm -rf $sharddir
        mkdir -p $sharddir
        cd $sharddir

        input_m4="$rel/05-ovl_asym_merge/ovl.nonlocal.m4"         output_shard_ids=./all_shard_ids         output_pwd=./pwd.txt         params_config_sh_fn="$rel/01-generate_config/generated.config"         params_max_nchunks="40"         params_log_level="INFO"         params_tmp_dir="./"             time ipa2-task phasing_prepare

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Your job 1491 ("ipa_small") has been submitted

Error executing rule phasing_prepare on cluster (jobid: 15, external: Your job 1491 ("ipa_small") has been submitted, jobscript: /work/jaavedm/pacbio_test/small/.snakemake/tmp.iic14pki/ For error details see the cluster log and the log files of the involved rule(s).
Cleanup job metadata.
Cleanup failed jobs output files.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-10-15T164008.089634.snakemake.log
removing lock
removing lock
removed all locks

When I check the SGE error logs for this task that fails, I find the following output:

Building DAG of jobs...
MissingInputException in rule ovl_asym_merge in file /home/jaavedm/anaconda3/envs/ipa-py3.11/etc/ipa.snakefile, line 230:
Missing input files for rule ovl_asym_merge:
    output: 05-ovl_asym_merge/ovl.merged.m4, 05-ovl_asym_merge/ovl.nonlocal.m4
    affected files:
config:{'advanced_options': '', 'coverage': 0, 'genome_size': 0, 'm4filt_high_copy_sample_rate': 1.0, 'max_nchunks': 40, 'nproc': 16, 'phase_run': 1, 'polish_run': 1, 'purge_dups_calcuts': '', 'purge_dups_run': 1, 'reads_fn': 'small/input.fofn', 'tmp_dir': './'}

The file ovl.sorted.m4 does not exist. When I list the tree structure of directory 04-ovl_asym_run, I find:

(ipa-py3.11) jaavedm@badger:/work/jaavedm/pacbio_test/small/04-ovl_asym_run$ tree -R .
└── 0
    ├── log.ovl_asym_run.pancake.s0.b0_0_1.memtime
    └── log.ovl_asym_run.sort.memtime

1 directory, 2 files

However, directory 05-ovl_asym_merge already has the output files ovl.merged.m4 and ovl.nonlocal.m4

(ipa-py3.11) jaavedm@badger:/work/jaavedm/pacbio_test/small/05-ovl_asym_merge$ ll
total 63276
drwxr-xr-x 2 jaavedm jaavedm     4096 Oct 15 16:40 ./
drwxrwxr-x 9 jaavedm jaavedm     4096 Oct 15 16:40 ../
-rw-r--r-- 1 jaavedm jaavedm      138 Oct 15 16:40 log.ovl_asym_merge.awk_nonlocals.memtime
-rw-r--r-- 1 jaavedm jaavedm      162 Oct 15 16:40 log.ovl_asym_merge.mergesort.memtime
-rw-r--r-- 1 jaavedm jaavedm       14 Oct 15 16:40 ovl.merged.fofn
-rw-r--r-- 1 jaavedm jaavedm 32528480 Oct 15 16:40 ovl.merged.m4
-rw-r--r-- 1 jaavedm jaavedm 32239524 Oct 15 16:40 ovl.nonlocal.m4
-rw-r--r-- 1 jaavedm jaavedm       35 Oct 15 16:40 sorted.fofn

To Reproduce

  1. Download and install IPA

    conda create -n ipa-py3.11 python=3.11
    conda activate ipa-py3.11
    conda install -c bioconda pbipa
  2. Download a small sample dataset from Hifiasm wget

  3. Run IPA in local mode. time ipa local --nthreads 8 --njobs 4 -i chr11-2M.fa.gz Program runs to completion without error.

  4. Run IPA in distributed mode. time ipa dist -i chr11-2M.fa.gz --run-dir small/ --cluster-args 'qsub -v PATH -S /bin/bash -N ipa_small -cwd -j y -pe smp {params.num_threads} -e qsub_log/ -o qsub_log/ -V' --nthreads 16 --njobs 7 --tmp-dir "./" --verbose IPA fails

Expected behavior Program should have identical behavior if run as "local" or as "dist"

armintoepfer commented 10 months ago

If the current solution isn't effective for your situation, there's nothing we can do at the moment. However, we may consider incorporating it into a future release.