gconcepcion / pb-assembly

PacBio Assembly Tools Suite
1 stars 0 forks source link

Error with Quiver polishing of unzipped haplotypes on test data #1

Closed cbergman closed 6 years ago

cbergman commented 6 years ago

I have been trying to get your bioconda falcon/falcon-unzip install to work on CentOS and had partial success with the Example test case. Following the instructions in the Readme, I can get falcon to run to completion and falcon-unzip to generate the expected alternate haplotypes. However, I consistently get an error at the quiver step. I have tried several clean installs of the conda environment and the test data repositories on 2 different machines, one using Centos 6 on a directly attached disk and one on Centos 7 using a lustre filesystem. The behavior is identical on both systems. No 4-quiver/cns-output directory is generated.

The results of my test run are as follows:

(FALCON) [cbergman@genome] $ ls -lrt 3-unzip/
total 480
drwxr-xr-x. 8 cbergman bergmanlab   4096 Sep  6 15:41 reads
drwxr-xr-x. 6 cbergman bergmanlab   4096 Sep  6 15:41 0-phasing
drwxr-xr-x. 3 cbergman bergmanlab   4096 Sep  6 15:41 1-hasm
drwxr-xr-x. 6 cbergman bergmanlab   4096 Sep  6 15:41 2-htigs
-rw-r--r--. 1 cbergman bergmanlab   1394 Sep  6 15:41 template.sh
-rw-r--r--. 1 cbergman bergmanlab    581 Sep  6 15:41 task.sh
-rw-r--r--. 1 cbergman bergmanlab    593 Sep  6 15:41 task.json
-rw-r--r--. 1 cbergman bergmanlab    207 Sep  6 15:41 run.sh
-rwxr-xr-x. 1 cbergman bergmanlab    254 Sep  6 15:41 run-Pa16d27079404fc.bash
-rw-r--r--. 1 cbergman bergmanlab   1671 Sep  6 15:41 user_script.sh
-rw-r--r--. 1 cbergman bergmanlab 188573 Sep  6 15:41 all_h_ctg.fa
-rw-r--r--. 1 cbergman bergmanlab      0 Sep  6 15:41 hasm_done
-rw-r--r--. 1 cbergman bergmanlab 203182 Sep  6 15:41 all_p_ctg.fa
-rw-r--r--. 1 cbergman bergmanlab   7709 Sep  6 15:41 all_p_ctg_edges
-rw-r--r--. 1 cbergman bergmanlab     79 Sep  6 15:41 all_h_ctg.paf
-rw-r--r--. 1 cbergman bergmanlab     17 Sep  6 15:41 all_h_ctg_ids
-rw-r--r--. 1 cbergman bergmanlab   8190 Sep  6 15:41 all_h_ctg_edges
-rw-r--r--. 1 cbergman bergmanlab  16329 Sep  6 15:41 all_phased_reads
-rw-r--r--. 1 cbergman bergmanlab      0 Sep  6 15:41 run.sh.done
-rw-r--r--. 1 cbergman bergmanlab   3419 Sep  6 15:41 run-Pa16d27079404fc.bash.stdout
-rw-r--r--. 1 cbergman bergmanlab   7451 Sep  6 15:41 run-Pa16d27079404fc.bash.stderr
(FALCON) [cbergman@genome] $ ls -lrt 4-quiver/
total 48
drwxr-xr-x. 2 cbergman bergmanlab 4096 Sep  6 15:41 track-reads
drwxr-xr-x. 2 cbergman bergmanlab 4096 Sep  6 15:41 select-reads
drwxr-xr-x. 4 cbergman bergmanlab 4096 Sep  6 15:41 merge-reads
drwxr-xr-x. 2 cbergman bergmanlab 4096 Sep  6 15:41 segregate-split
drwxr-xr-x. 4 cbergman bergmanlab 4096 Sep  6 15:41 segregate-chunks
drwxr-xr-x. 4 cbergman bergmanlab 4096 Sep  6 15:41 segregate-run
drwxr-xr-x. 2 cbergman bergmanlab 4096 Sep  6 15:42 segregate-gathered
drwxr-xr-x. 2 cbergman bergmanlab 4096 Sep  6 15:42 segregated-bam
drwxr-xr-x. 3 cbergman bergmanlab 4096 Sep  6 15:42 quiver-split
drwxr-xr-x. 3 cbergman bergmanlab 4096 Sep  6 15:42 cns-gather
drwxr-xr-x. 4 cbergman bergmanlab 4096 Sep  6 15:42 quiver-chunks
drwxr-xr-x. 4 cbergman bergmanlab 4096 Sep  6 15:42 quiver-run

Any ideas what might be causing this?

Thanks in advance!

gconcepcion commented 6 years ago

I need to take a look at the stderr found here to help you out any further:

4-quiver/quiver-run/000000Fp01/run-*.bash.stderr

cbergman commented 6 years ago

Thanks for the quick reply. Here is the output of 4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash.stderr. The machine has 24 cores and 64Gb of memory.

[cbergman@genome] $ more falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash.stderr
executable=${PYPEFLOW_JOB_START_SCRIPT}
+ executable=/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash
timeout=${PYPEFLOW_JOB_START_TIMEOUT:-60} # wait 60s by default
+ timeout=60

# Wait up to timeout seconds for the executable to become "executable",
# then exec.
#timeleft = int(timeout)
while [[ ! -x "${executable}" ]]; do
    if [[ "${timeout}" == "0" ]]; then
        echo "timed out waiting for (${executable})"
        exit 77
    fi
    echo "not executable: '${executable}', waiting ${timeout}s"
    sleep 1
    timeout=$((timeout-1))
done
+ [[ ! -x /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash ]]

/bin/bash ${executable}
+ /bin/bash /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash
+ '[' '!' -d /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01 ']'
+ cd /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01
+ eval '/bin/bash run.sh'
++ /bin/bash run.sh
export PATH=$PATH:/bin
+ export PATH=/home/cbergman/miniconda3/envs/FALCON/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/usr/lib64/qt-3.3/bin:/usr/local/apache2/bin:/usr/local/a
pache2:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cbergman/miniconda3/bin:/bin
+ PATH=/home/cbergman/miniconda3/envs/FALCON/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/usr/lib64/qt-3.3/bin:/usr/local/apache2/bin:/usr/local/apache2:
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cbergman/miniconda3/bin:/bin
cd /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01
+ cd /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01
/bin/bash task.sh
+ /bin/bash task.sh
pypeflow 2.0.4
2018-09-06 15:55:24,246 - root - DEBUG - Running "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py /home/cbergman/falcon/FALCON-examples/ru
n/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.json"
2018-09-06 15:55:24,247 - root - DEBUG - Checking existence of '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.json' with timeout=30
2018-09-06 15:55:24,247 - root - DEBUG - Loading JSON from '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.json'
2018-09-06 15:55:24,247 - root - DEBUG - {u'bash_template_fn': u'template.sh',
 u'inputs': {u'bash_template': u'/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-template.sh',
             u'units_of_work': u'/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-chunks/000000Fp01/some-units-of-work.json'},
 u'outputs': {u'results': u'results.json'},
 u'parameters': {u'pypeflow_mb': 4000, u'pypeflow_nproc': 24}}
2018-09-06 15:55:24,247 - root - WARNING - CD: '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' <- '/home/cbergman/falcon/FALCON-examples
/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-06 15:55:24,247 - root - DEBUG - Checking existence of u'/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-chunks/000000Fp01/some-units-of-work.jso
n' with timeout=30
2018-09-06 15:55:24,248 - root - DEBUG - Checking existence of u'/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-template.sh' with timeout=30
2018-09-06 15:55:24,248 - root - DEBUG - Checking existence of u'template.sh' with timeout=30
2018-09-06 15:55:24,248 - root - WARNING - CD: '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' <- '/home/cbergman/falcon/FALCON-examples
/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-06 15:55:24,248 - root - INFO - $('/bin/bash user_script.sh')
hostname
+ hostname
pwd
+ pwd
date
+ date
# Substitution will be similar to snakemake "shell".
python -m falcon_kit.mains.generic_run_units_of_work --nproc=24 --units-of-work-fn=/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-chunks/000000Fp01/some
-units-of-work.json --bash-template-fn=/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-template.sh --results-fn=results.json
+ python -m falcon_kit.mains.generic_run_units_of_work --nproc=24 --units-of-work-fn=/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-chunks/000000Fp01/so
me-units-of-work.json --bash-template-fn=/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-template.sh --results-fn=results.json
falcon-kit 1.2.2
pypeflow 2.0.4
INFO:root:INPUT:{u'ref_fasta': u'/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa', u'read_bam': u'/home/cbergman/falcon/FA
LCON-examples/run/greg200k-sv2/4-quiver/segregate-run/segr001/segregated/000000Fp01/000000Fp01.bam', u'ctg_type': u'/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-qui
ver/quiver-split/./refs/000000Fp01/ctg_type'}
INFO:root:OUTPUT:{u'cns_fasta': u'cns.fasta.gz', u'cns_vcf': u'cns.vcf', u'job_done': u'quiver_done', u'ctg_type_again': u'ctg_type', u'cns_fastq': u'cns.fastq.gz'}
INFO:root:PARAMS:{'pypeflow_nproc': '24', u'ctg_id': u'000000Fp01'}
INFO:root:$('rm -rf uow-00')
WARNING:root:CD: 'uow-00' <- '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
INFO:root:$('/bin/bash user_script.sh')
hostname
+ hostname
pwd
+ pwd
date
+ date
set -vex
+ set -vex
trap 'touch quiver_done.exit' EXIT
+ trap 'touch quiver_done.exit' EXIT
hostname
+ hostname
date
+ date

samtools faidx /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa
+ samtools faidx /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa
pbalign --tmpDir=$(pwd)/tmp --nproc=24 --minAccuracy=0.75 --minLength=50          --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr          --algorithmOptio
ns=--useQuality --maxHits=1 --hitPolicy=random --seed=1            /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/segregate-run/segr001/segregated/000000Fp01/0
00000Fp01.bam /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa aln-000000Fp01.bam
++ pwd
+ pbalign --tmpDir=/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/tmp --nproc=24 --minAccuracy=0.75 --minLength=50 --minAnchorSize
=12 --maxDivergence=30 --concordant --algorithm=blasr --algorithmOptions=--useQuality --maxHits=1 --hitPolicy=random --seed=1 /home/cbergman/falcon/FALCON-examples/run/greg200k
-sv2/4-quiver/segregate-run/segr001/segregated/000000Fp01/000000Fp01.bam /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa a
ln-000000Fp01.bam
BamPostService returned a non-zero exit status samtools sort --threads 24 -m 4G -o /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/
aln-000000Fp01.bam /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/tmp/Xv4n8O.bam. CMD: '1'
ERROR: samtools sort: couldn't allocate memory for bam_mem
Output:[]
BamPostService returned a non-zero exit status samtools sort --threads 24 -m 4G -o /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/
aln-000000Fp01.bam /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/tmp/Xv4n8O.bam. CMD: '1'
ERROR: samtools sort: couldn't allocate memory for bam_mem
Output:[]
Traceback (most recent call last):
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
    return_code = exe_main_func(*args, **kwargs)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbalign/pbalignrunner.py", line 284, in args_runner
    return PBAlignRunner(args, output_dataset_type=output_dataset_type).start()
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbcore/util/ToolRunner.py", line 85, in start
    return self.run()
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbalign/pbalignrunner.py", line 263, in run
    BamPostService(filenames=self.fileNames, nproc=self.args.nproc).run()
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbalign/bampostservice.py", line 136, in run
    nproc=self.nproc)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbalign/bampostservice.py", line 101, in _sortbam
    Execute(self.name, cmd)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pbalign/utils/progutil.py", line 69, in Execute
    raise RuntimeError(errMsg)
RuntimeError: BamPostService returned a non-zero exit status samtools sort --threads 24 -m 4G -o /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/0000
00Fp01/uow-00/aln-000000Fp01.bam /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/tmp/Xv4n8O.bam. CMD: '1'
ERROR: samtools sort: couldn't allocate memory for bam_mem
Output:[]
touch quiver_done.exit
+ touch quiver_done.exit
WARNING:root:Call '/bin/bash user_script.sh' returned 512.
WARNING:root:CD: 'uow-00' -> '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
Traceback (most recent call last):
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 115, in <module>
    main()
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 111, in main
    run(**vars(args))
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 64, in run
    pypeflow.do_task.run_bash(script, inputs, outputs, params)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py", line 178, in run_bash
    util.system(cmd)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 512.
2018-09-06 15:55:36,884 - root - WARNING - Call '/bin/bash user_script.sh' returned 256.
2018-09-06 15:55:36,884 - root - WARNING - CD: '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' -> '/home/cbergman/falcon/FALCON-examples
/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-06 15:55:36,884 - root - WARNING - CD: '/home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' -> '/home/cbergman/falcon/FALCON-examples
/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-06 15:55:36,885 - root - CRITICAL - Error in /home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py with args="{'json_fn': '/home/cbergma
n/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.json',\n 'timeout': 30,\n 'tmpdir': None}"
Traceback (most recent call last):
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py", line 246, in <module>
    main()
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py", line 238, in main
    run(**vars(parsed_args))
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py", line 232, in run
    run_cfg_in_tmpdir(cfg, tmpdir)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py", line 208, in run_cfg_in_tmpdir
    run_bash(bash_template, myinputs, myoutputs, parameters)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/do_task.py", line 178, in run_bash
    util.system(cmd)
  File "/home/cbergman/miniconda3/envs/FALCON/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 256.
+++ pwd
++ echo 'FAILURE. Running top in /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01 (If you see -terminal database is inaccessible- you are u
sing the python bin-wrapper, so you will not get diagnostic info. No big deal. This process is crashing anyway.)'
++ rm -f top.txt
++ which python
++ which top
++ env -u LD_LIBRARY_PATH top -b -n 1
++ env -u LD_LIBRARY_PATH top -b -n 1
++ pstree -apl

real    0m13.387s
user    2m19.381s
sys 0m2.711s
+ finish
+ echo 'finish code: 1'
gconcepcion commented 6 years ago

Hm, interesting. I've heard anecdotal reports of multiple success cases already w/ the new bioconda docs- and I routinely run this test case on CentOS & Ubuntu VMs w/ 16cores & 16Gb of ram (~1Gb/core) with no issues

Have you made any modifications to fc_unzip.cfg?

cbergman commented 6 years ago

Nope, fc_unzip.cfg is idenitical to the current version in FALCON-examples/run/greg200k-sv2/fc_unzip.cfg: https://github.com/pb-cdunn/FALCON-examples/blob/master/run/greg200k-sv2/fc_unzip.cfg

[cbergman@genome] $ more falcon/FALCON-examples/run/greg200k-sv2/fc_unzip.cfg
[General]
max_n_open_files = 1000

[Unzip]

input_fofn= input.fofn
input_bam_fofn= input_bam.fofn
#sge_phasing= -pe smp 12 -q bigmem
#sge_quiver= -pe smp 12 -q sequel-farm
#sge_track_reads= -pe smp 12 -q default
#sge_blasr_aln=  -pe smp 24 -q bigmem
#sge_hasm=  -pe smp 48 -q bigmem
#unzip_concurrent_jobs = 64
#quiver_concurrent_jobs = 64

#unzip_concurrent_jobs = 12
#quiver_concurrent_jobs = 12

[job.defaults]
NPROC=4
njobs=7
job_type = SGE
job_type = local

#use_tmpdir = /scratch
pwatcher_type = blocking
job_type = string
submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

#njobs=120
njobs=8
NPROC=4
#submit = qsub -S /bin/bash -sync y -V  \
#  -q ${JOB_QUEUE}    \
#  -N ${JOB_NAME}     \
#  -o "${JOB_STDOUT}" \
#  -e "${JOB_STDERR}" \
#  -pe smp ${NPROC}   \
#  -l h_vmem=${MB}M   \
#  "${JOB_SCRIPT}"
[job.step.unzip.track_reads]
njobs=1
NPROC=48
[job.step.unzip.blasr_aln]
njobs=2
NPROC=16
[job.step.unzip.phasing]
njobs=16
NPROC=2
[job.step.unzip.hasm]
njobs=1
NPROC=48

The error appears to be related to the allocation of memory in samtools sort, where it is asking for 24 cores each with 4gb of ram. These values do not correspond to anything in the fc_unzip.cfg file, so I assume they are being set dynamically somewhere in falcon unzip because the system resources are lower than the numbers specified in the fc_unzip.cfg file. Is this correct, and if so could the dynamic allocation be asking for more than the system has available?

gconcepcion commented 6 years ago

Like, I said, I routinely run this locally on a smaller system than what you've described with no issues, so I don't understand why it would cause an error w/ your system.

What happens if you try to run the command by itself on the command line? What happens if you reduce the pbalign --nproc parameter to something like, --nproc 16 or --nproc 12 ?

cbergman commented 6 years ago

[Unzip]

input_fofn= input.fofn input_bam_fofn= input_bam.fofn

sge_phasing= -pe smp 12 -q bigmem

sge_quiver= -pe smp 12 -q sequel-farm

sge_track_reads= -pe smp 12 -q default

sge_blasr_aln= -pe smp 24 -q bigmem

sge_hasm= -pe smp 48 -q bigmem

unzip_concurrent_jobs = 64

quiver_concurrent_jobs = 64

unzip_concurrent_jobs = 12

quiver_concurrent_jobs = 12

[job.defaults] NPROC=12 njobs=1 job_type = SGE job_type = local

use_tmpdir = /scratch

pwatcher_type = blocking job_type = string submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

njobs=120

njobs=1 NPROC=12

submit = qsub -S /bin/bash -sync y -V \

-q ${JOB_QUEUE} \

-N ${JOB_NAME} \

-o "${JOB_STDOUT}" \

-e "${JOB_STDERR}" \

-pe smp ${NPROC} \

-l h_vmem=${MB}M \

"${JOB_SCRIPT}"

[job.step.unzip.track_reads] njobs=1 NPROC=12 [job.step.unzip.blasr_aln] njobs=1 NPROC=12 [job.step.unzip.phasing] njobs=1 NPROC=12 [job.step.unzip.hasm] njobs=1 NPROC=12

- this led to the same error as above, with pbalign trying to use 24 processors:

(FALCON) [cbergman@genome] $ more 4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash.stderr executable=${PYPEFLOW_JOB_START_SCRIPT}

Wait up to timeout seconds for the executable to become "executable",

then exec.

timeleft = int(timeout)

while [[ ! -x "${executable}" ]]; do if [[ "${timeout}" == "0" ]]; then echo "timed out waiting for (${executable})" exit 77 fi echo "not executable: '${executable}', waiting ${timeout}s" sleep 1 timeout=$((timeout-1)) done

/bin/bash ${executable}

samtools faidx /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa

real 0m8.130s user 2m17.716s sys 0m3.680s

gconcepcion commented 6 years ago

Ahh. This is a trivial issue. The fc_unzip.cfg packaged with the example is missing a section and will be updated.

Add this section to your fc_unzip.cfg:


[job.step.unzip.quiver]
njobs=12
NPROC=10
MB=98304
cbergman commented 6 years ago

[Unzip]

input_fofn= input.fofn input_bam_fofn= input_bam.fofn

sge_phasing= -pe smp 12 -q bigmem

sge_quiver= -pe smp 12 -q sequel-farm

sge_track_reads= -pe smp 12 -q default

sge_blasr_aln= -pe smp 24 -q bigmem

sge_hasm= -pe smp 48 -q bigmem

unzip_concurrent_jobs = 64

quiver_concurrent_jobs = 64

unzip_concurrent_jobs = 12

quiver_concurrent_jobs = 12

[job.defaults] NPROC=12 njobs=1 job_type = SGE job_type = local

use_tmpdir = /scratch

pwatcher_type = blocking job_type = string submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

njobs=120

njobs=1 NPROC=12

submit = qsub -S /bin/bash -sync y -V \

-q ${JOB_QUEUE} \

-N ${JOB_NAME} \

-o "${JOB_STDOUT}" \

-e "${JOB_STDERR}" \

-pe smp ${NPROC} \

-l h_vmem=${MB}M \

"${JOB_SCRIPT}"

[job.step.unzip.track_reads] njobs=1 NPROC=12 [job.step.unzip.blasr_aln] njobs=1 NPROC=12 [job.step.unzip.phasing] njobs=1 NPROC=12 [job.step.unzip.hasm] njobs=1 NPROC=12 [job.step.unzip.quiver] njobs=1 NPROC=12 MB=24000

- The same error was observed as above (24 processors requested by pbalign/samtools sort) suggesting that either this section of the quiver config is not being picked up properly and/or there is a call that is using the number of processors from the system:

(FALCON) [cbergman@genome] $ more 4-quiver/quiver-run/000000Fp01/run-P2f4330e72933ab.bash.stderr executable=${PYPEFLOW_JOB_START_SCRIPT}

Wait up to timeout seconds for the executable to become "executable",

then exec.

timeleft = int(timeout)

while [[ ! -x "${executable}" ]]; do if [[ "${timeout}" == "0" ]]; then echo "timed out waiting for (${executable})" exit 77 fi echo "not executable: '${executable}', waiting ${timeout}s" sleep 1 timeout=$((timeout-1)) done

/bin/bash ${executable}

samtools faidx /home/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/00000 0Fp01/ref.fa

real 0m8.648s user 2m18.476s sys 0m3.776s

gconcepcion commented 6 years ago

After adding the [job.step.unzip.quiver] section, did you also do this:

$ rm -rf 4-quiver/quiver-* 4-quiver/cns-*

before re-launching?

cbergman commented 6 years ago

yes, I did a rm -rf 3-unzip/ 4-quiver/ before relaunching fc_unzip.py fc_unzip.cfg. I also tried with MB=48000 (after deleting 3-unzip/ & 4-quiver/) and had the same error.

gconcepcion commented 6 years ago

Hmm - interesting. Unfortunately pbalign --nproc=24 is hardcoded, but will hopefully be configurable soon.

I still don't understand why it's failing for you; as I mentioned, I routinely run this on an even smaller box than yours.

CentOS6.6; 16Gb RAM; 16cores:

(pb-assembly) login66-biofx02:000000Fp01$ lsb_release -d
Description:    CentOS release 6.6 (Final)
(pb-assembly) login66-biofx02:000000Fp01$ head -n 2 /proc/meminfo 
MemTotal:       15942828 kB
MemFree:         4338916 kB
(pb-assembly) login66-biofx02:000000Fp01$ grep processor /proc/cpuinfo | wc -l
16
(pb-assembly) login66-biofx02:000000Fp01$ pwd
/home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01
(pb-assembly) login66-biofx02:000000Fp01$ pbalign --tmpDir=uow-00/tmp --nproc=24 --minAccuracy=0.75 --minLength=50 --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr --algorithmOptions=--useQuality --maxHits=1 --hitPolicy=random --seed=1 /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/segregate-run/segr001/segregated/000000Fp01/000000Fp01.bam /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-split/./refs/000000Fp01/ref.fa aln-000000Fp01.bam
[INFO] 2018-09-10 22:35:07,334Z [pbalign.pbalignrunner _pacbio_main_runner 130] Using pbcommand v1.1.1
[INFO] 2018-09-10 22:35:07,334Z [pbalign.pbalignrunner _pacbio_main_runner 131] completed setting up logger with <function setup_log at 0x7f0441d2c9b0>
[INFO] 2018-09-10 22:35:07,334Z [pbalign.pbalignrunner _pacbio_main_runner 132] log opts {'file_name': None, 'level': 20}
[INFO] 2018-09-10 22:35:07,337Z [root run 227] pbalign version: 0.3.1
[INFO] 2018-09-10 22:35:07,409Z [root run 178] BlasrService: Align reads to references using blasr.
[INFO] 2018-09-10 22:35:07,411Z [root Execute 63] BlasrService: Call "blasr /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/segregate-run/segr001/segregated/000000Fp01/000000Fp01.bam /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-split/refs/000000Fp01/ref.fa --out /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/uow-00/tmp/tmpjtDyfE/03KOFY.bam  --bam  --bestn 1 --minMatch 12  --maxMatch 30  --nproc 24  --minSubreadLength 50 --minAlnLength 50  --minPctSimilarity 70 --minPctAccuracy 75 --hitPolicy random  --concordant  --randomSeed 1  --useQuality "
[INFO] 2018-09-10 22:35:15,744Z [root run 150] FilterService: Filter alignments using samFilter.
[INFO] 2018-09-10 22:35:15,744Z [root Execute 63] FilterService: Call "rm -f /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/uow-00/tmp/tmpjtDyfE/oTuowI.bam && ln -s /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/uow-00/tmp/tmpjtDyfE/03KOFY.bam /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/uow-00/tmp/tmpjtDyfE/oTuowI.bam"
[INFO] 2018-09-10 22:35:15,787Z [root run 133] BamPostService: Sort and build index for a bam file.
[INFO] 2018-09-10 22:35:15,788Z [root Execute 63] BamPostService: Call "samtools --version||true"
[INFO] 2018-09-10 22:35:15,838Z [root Execute 63] BamPostService: Call "samtools sort --threads 24 -m 4G -o /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/aln-000000Fp01.bam /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/uow-00/tmp/tmpjtDyfE/oTuowI.bam"
[INFO] 2018-09-10 22:35:16,462Z [root Execute 63] BamPostService: Call "samtools --version"
[INFO] 2018-09-10 22:35:16,521Z [root Execute 63] BamPostService: Call "samtools index /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/aln-000000Fp01.bam /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/aln-000000Fp01.bam.bai"
[INFO] 2018-09-10 22:35:16,885Z [root Execute 63] BamPostService: Call "pbindex /home/UNIXHOME/gconcepcion/sandbox/falcon/FALCON-examples/run/greg200k/4-quiver/quiver-run/000000Fp01/aln-000000Fp01.bam"
[INFO] 2018-09-10 22:35:17,469Z [root run 277] Total time: 10.13 s.
[INFO] 2018-09-10 22:35:17,469Z [pbalign.pbalignrunner _pacbio_main_runner 155] exiting with return code 0 in 10.14 sec.

The pbalign command works when I request more processors than available (NPROC=24; versus system has NPROC=16); however I am able to replicate the error message when I run pbalign --nproc=30 or anything higher than nproc=30.

In short, It looks like the best solution for you is for us to make the pbalign --nproc setting configurable, however I still can't explain why this works on my smaller test system and why it is failing for you.

gconcepcion commented 6 years ago

This will be fixed in the next Bioconda release of pb-assembly - in the mean time you can fix it yourself by modifying this file: .conda/envs/pb-assembly/lib/python2.7/site-packages/falcon_unzip/tasks/unzip.py

Change line 259: -pbalign --tmpDir=$(pwd)/tmp --nproc=24 --minAccuracy=0.75 --minLength=50\ +pbalign --tmpDir=$(pwd)/tmp --nproc=$nproc --minAccuracy=0.75 --minLength=50\

and add this line right before that: nproc={params.pypeflow_nproc}

At this point the script should be using the actual value that you specify in your config and hopefully allows you to complete the test case

cbergman commented 6 years ago

Success! Thanks for the workaround to get the test data working.

I think you should also modify line 261 to remove the hard-coded 24 processor value in variantCaller:

- (variantCaller --algorithm=arrow -x 5 -X 120 -q 20 -j 24 -r {input.ref_fasta} aln-{params.ctg_id}.bam\ -o {output.cns_fasta} -o {output.cns_fastq} --minConfidence 0 -o {output.cns_vcf}) || echo WARNING quiver failed. Maybe no reads for this block. + (variantCaller --algorithm=arrow -x 5 -X 120 -q 20 -j $nproc -r {input.ref_fasta} aln-{params.ctg_id}.bam\ -o {output.cns_fasta} -o {output.cns_fastq} --minConfidence 0 -o {output.cns_vcf}) || echo WARNING quiver failed. Maybe no reads for this block.

I'd like to test on some real data before considering this resolved. Also, in testing on another machine I think I'm coming to an understanding of why the hardcoded 24 processors is failing (but it is exposing another potential issue). So could we keep this issue open for a bit longer?

gconcepcion commented 6 years ago

Great - glad it works.

Yes, the variantCaller line has already been fixed in the next build as well.

I'll leave the issue open for a little while, but be advised, the official Repo has moved to: https://github.com/PacificBiosciences/pb-assembly

and we will be triaging ALL pbbioconda related issues here: https://github.com/PacificBiosciences/pbbioconda/issues

cbergman commented 6 years ago

I've gotten the patched version of unzip to work on a real dataset now on workstation with 24 cores (using only 12 cores and 1 job) that has directly attached storage. So it looks like the patch above can resolve unzip when working directly on a single Centos 6 machine.

In debugging the test data on different machines, I now realize that I was getting two slightly different unzip errors at similar stages in the pipeline on different Centos machines. When using a Centos 7 machine with 28 cores on a cluster system using a PBS-torque scheduler and writing to an attached lustre filesystem, I get a different unzip error than we have been discussing above. This happens with or without the $nproc patch.

Using the following unzip config (the same config that works when running directly on a Centos 6 box):

(falcon) [cbergman@sapelo2] $ more fc_unzip.cfg
[General]
max_n_open_files = 1000

[Unzip]

input_fofn= input.fofn
input_bam_fofn= input_bam.fofn
#sge_phasing= -pe smp 12 -q bigmem
#sge_quiver= -pe smp 12 -q sequel-farm
#sge_track_reads= -pe smp 12 -q default
#sge_blasr_aln=  -pe smp 24 -q bigmem
#sge_hasm=  -pe smp 48 -q bigmem
#unzip_concurrent_jobs = 64
#quiver_concurrent_jobs = 64

#unzip_concurrent_jobs = 12
#quiver_concurrent_jobs = 12

[job.defaults]
NPROC=12
njobs=1
job_type = SGE
job_type = local

#use_tmpdir = /scratch
pwatcher_type = blocking
job_type = string
submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

#njobs=120
njobs=1
NPROC=12
#submit = qsub -S /bin/bash -sync y -V  \
#  -q ${JOB_QUEUE}    \
#  -N ${JOB_NAME}     \
#  -o "${JOB_STDOUT}" \
#  -e "${JOB_STDERR}" \
#  -pe smp ${NPROC}   \
#  -l h_vmem=${MB}M   \
#  "${JOB_SCRIPT}"
[job.step.unzip.track_reads]
njobs=1
NPROC=12
[job.step.unzip.blasr_aln]
njobs=1
NPROC=12
[job.step.unzip.phasing]
njobs=1
NPROC=12
[job.step.unzip.hasm]
njobs=1
NPROC=12
[job.step.unzip.quiver]
njobs=1
NPROC=12
MB=48000

I consistenly get an error that stops the process at the quiver-run stage:

(falcon) [cbergman@sapelo2] $ ls -lrt 4-quiver/
total 48
drwxr-xr-x 2 cbergman cmblab 4096 Sep 13 08:54 track-reads
drwxr-xr-x 2 cbergman cmblab 4096 Sep 13 08:54 select-reads
drwxr-xr-x 4 cbergman cmblab 4096 Sep 13 08:54 merge-reads
drwxr-xr-x 2 cbergman cmblab 4096 Sep 13 08:54 segregate-split
drwxr-xr-x 4 cbergman cmblab 4096 Sep 13 08:54 segregate-chunks
drwxr-xr-x 4 cbergman cmblab 4096 Sep 13 08:54 segregate-run
drwxr-xr-x 2 cbergman cmblab 4096 Sep 13 08:55 segregate-gathered
drwxr-xr-x 2 cbergman cmblab 4096 Sep 13 08:55 segregated-bam
drwxr-xr-x 3 cbergman cmblab 4096 Sep 13 08:55 quiver-split
drwxr-xr-x 3 cbergman cmblab 4096 Sep 13 08:55 cns-gather
drwxr-xr-x 4 cbergman cmblab 4096 Sep 13 08:55 quiver-chunks
drwxr-xr-x 3 cbergman cmblab 4096 Sep 13 08:55 quiver-run
(falcon) [cbergman@sapelo2] $ more 4-quiver/quiver-run/000000Fp01/run-Pf6358769b261f1.bash.stderr
executable=${PYPEFLOW_JOB_START_SCRIPT}
+ executable=/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-Pf6358769b261f1.bash
timeout=${PYPEFLOW_JOB_START_TIMEOUT:-60} # wait 60s by default
+ timeout=60

# Wait up to timeout seconds for the executable to become "executable",
# then exec.
#timeleft = int(timeout)
while [[ ! -x "${executable}" ]]; do
    if [[ "${timeout}" == "0" ]]; then
        echo "timed out waiting for (${executable})"
        exit 77
    fi
    echo "not executable: '${executable}', waiting ${timeout}s"
    sleep 1
    timeout=$((timeout-1))
done
+ [[ ! -x /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-Pf6358769b261f1.bash ]]

/bin/bash ${executable}
+ /bin/bash /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/run-Pf6358769b261f1.bash
+ '[' '!' -d /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01 ']'
+ cd /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01
+ eval '/bin/bash run.sh'
++ /bin/bash run.sh
export PATH=$PATH:/bin
+ export PATH=/home/cbergman/miniconda3/envs/falcon/bin:/home/cbergman/.aspera/connect/bin/:/home/cbergman/bin/bioawk/:/home/cbergman/bin/smrtlink/smrtcmd
s/bin/:/home/cbergman/bin/hmmer-3.1b2-linux-intel-x86_64/binaries/:/home/cbergman/bin/gt-1.5.9-Linux_x86_64-64bit-complete/bin/:/home/cbergman/bin/dx-tool
kit-v0.170.1/bin/:/home/cbergman/bin/gnuplot/:/home/cbergman/.aspera/connect/bin/:/home/cbergman/bin/bioawk/:/home/cbergman/bin/smrtlink/smrtcmds/bin/:/ho
me/cbergman/bin/hmmer-3.1b2-linux-intel-x86_64/binaries/:/home/cbergman/bin/gt-1.5.9-Linux_x86_64-64bit-complete/bin/:/home/cbergman/bin/dx-toolkit-v0.170
.1/bin/:/home/cbergman/bin/gnuplot/:/usr/local/bin:/usr/local/apps/ogrt/0.5.0-8/client/bin:/opt/apps/torque/6.1.1.1/bin:/opt/apps/torque/6.1.1.1/sbin:/usr
/lib64/qt-3.3/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/opt/singularity/bin:/usr/tools/bin:/home/cbergman/miniconda3/bin:/bin
+ PATH=/home/cbergman/miniconda3/envs/falcon/bin:/home/cbergman/.aspera/connect/bin/:/home/cbergman/bin/bioawk/:/home/cbergman/bin/smrtlink/smrtcmds/bin/:
/home/cbergman/bin/hmmer-3.1b2-linux-intel-x86_64/binaries/:/home/cbergman/bin/gt-1.5.9-Linux_x86_64-64bit-complete/bin/:/home/cbergman/bin/dx-toolkit-v0.
170.1/bin/:/home/cbergman/bin/gnuplot/:/home/cbergman/.aspera/connect/bin/:/home/cbergman/bin/bioawk/:/home/cbergman/bin/smrtlink/smrtcmds/bin/:/home/cber
gman/bin/hmmer-3.1b2-linux-intel-x86_64/binaries/:/home/cbergman/bin/gt-1.5.9-Linux_x86_64-64bit-complete/bin/:/home/cbergman/bin/dx-toolkit-v0.170.1/bin/
:/home/cbergman/bin/gnuplot/:/usr/local/bin:/usr/local/apps/ogrt/0.5.0-8/client/bin:/opt/apps/torque/6.1.1.1/bin:/opt/apps/torque/6.1.1.1/sbin:/usr/lib64/
qt-3.3/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/opt/singularity/bin:/usr/tools/bin:/home/cbergman/miniconda3/bin:/bin
cd /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01
+ cd /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01
/bin/bash task.sh
+ /bin/bash task.sh
pypeflow 2.0.4
2018-09-13 08:55:22,781 - root - DEBUG - Running "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py /lustre1/cbergman/
falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.json"
2018-09-13 08:55:22,782 - root - DEBUG - Checking existence of '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/t
ask.json' with timeout=30
2018-09-13 08:55:22,783 - root - DEBUG - Loading JSON from '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.
json'
2018-09-13 08:55:22,783 - root - DEBUG - {u'bash_template_fn': u'template.sh',
 u'inputs': {u'bash_template': u'/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-template.sh',
             u'units_of_work': u'/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-chunks/000000Fp01/some-units-of-work.json'},
 u'outputs': {u'results': u'results.json'},
 u'parameters': {u'pypeflow_mb': u'48000', u'pypeflow_nproc': u'12'}}
2018-09-13 08:55:22,783 - root - WARNING - CD: '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' <- '/lustre1/cbe
rgman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-13 08:55:22,783 - root - DEBUG - Checking existence of u'/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-chunks/000000Fp
01/some-units-of-work.json' with timeout=30
2018-09-13 08:55:22,813 - root - DEBUG - Checking existence of u'/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-temp
late.sh' with timeout=30
2018-09-13 08:55:22,830 - root - DEBUG - Checking existence of u'template.sh' with timeout=30
2018-09-13 08:55:22,831 - root - WARNING - CD: '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' <- '/lustre1/cbe
rgman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-13 08:55:22,874 - root - INFO - $('/bin/bash user_script.sh')
hostname
+ hostname
pwd
+ pwd
date
+ date
# Substitution will be similar to snakemake "shell".
python -m falcon_kit.mains.generic_run_units_of_work --nproc=12 --units-of-work-fn=/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiv
er-chunks/000000Fp01/some-units-of-work.json --bash-template-fn=/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-templ
ate.sh --results-fn=results.json
+ python -m falcon_kit.mains.generic_run_units_of_work --nproc=12 --units-of-work-fn=/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/qu
iver-chunks/000000Fp01/some-units-of-work.json --bash-template-fn=/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/bash-tem
plate.sh --results-fn=results.json
falcon-kit 1.2.2
pypeflow 2.0.4
INFO:root:INPUT:{u'ref_fasta': u'/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa', u'read_bam': u
'/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/segregate-run/segr000/segregated/000000Fp01/000000Fp01.bam', u'ctg_type': u'/lustre1/c
bergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ctg_type'}
INFO:root:OUTPUT:{u'cns_fasta': u'cns.fasta.gz', u'cns_vcf': u'cns.vcf', u'job_done': u'quiver_done', u'ctg_type_again': u'ctg_type', u'cns_fastq': u'cns.
fastq.gz'}
INFO:root:PARAMS:{'pypeflow_nproc': '12', u'ctg_id': u'000000Fp01'}
INFO:root:$('rm -rf uow-00')
WARNING:root:CD: 'uow-00' <- '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
INFO:root:$('/bin/bash user_script.sh')
hostname
+ hostname
pwd
+ pwd
date
+ date
set -vex
+ set -vex
trap 'touch quiver_done.exit' EXIT
+ trap 'touch quiver_done.exit' EXIT
hostname
+ hostname
date
+ date

samtools faidx /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa
+ samtools faidx /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa
nproc=12
+ nproc=12
pbalign --tmpDir=$(pwd)/tmp --nproc=$nproc --minAccuracy=0.75 --minLength=50          --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr
          --algorithmOptions=--useQuality --maxHits=1 --hitPolicy=random --seed=1            /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-q
uiver/segregate-run/segr000/segregated/000000Fp01/000000Fp01.bam /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/00
0000Fp01/ref.fa aln-000000Fp01.bam
++ pwd
+ pbalign --tmpDir=/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/uow-00/tmp --nproc=12 --minAccuracy=0.75 --min
Length=50 --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr --algorithmOptions=--useQuality --maxHits=1 --hitPolicy=random --seed=1 /lu
stre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/segregate-run/segr000/segregated/000000Fp01/000000Fp01.bam /lustre1/cbergman/falcon/FALCON
-examples/run/greg200k-sv2/4-quiver/quiver-split/./refs/000000Fp01/ref.fa aln-000000Fp01.bam
Traceback (most recent call last):
  File "/home/cbergman/miniconda3/envs/falcon/bin/pbalign", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3105, in <module>
    @_call_aside
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3089, in _call_aside
    f(*args, **kwargs)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3118, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pkg_resources/__init__.py", line 578, in _build_master
    ws.require(__requires__)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pkg_resources/__init__.py", line 895, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pkg_resources/__init__.py", line 781, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'avro' distribution was not found and is required by pbcommand
touch quiver_done.exit
+ touch quiver_done.exit
WARNING:root:Call '/bin/bash user_script.sh' returned 256.
WARNING:root:CD: 'uow-00' -> '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
Traceback (most recent call last):
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 115, in <module>
    main()
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 111, in main
    run(**vars(args))
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 64, in run
    pypeflow.do_task.run_bash(script, inputs, outputs, params)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py", line 178, in run_bash
    util.system(cmd)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 256.
2018-09-13 08:55:24,738 - root - WARNING - Call '/bin/bash user_script.sh' returned 256.
2018-09-13 08:55:24,738 - root - WARNING - CD: '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' -> '/lustre1/cbe
rgman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-13 08:55:24,739 - root - WARNING - CD: '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01' -> '/lustre1/cbe
rgman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01'
2018-09-13 08:55:24,739 - root - CRITICAL - Error in /home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py with args="{'js
on_fn': '/lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01/task.json',\n 'timeout': 30,\n 'tmpdir': None}"
Traceback (most recent call last):
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py", line 246, in <module>
    main()
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py", line 238, in main
    run(**vars(parsed_args))
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py", line 232, in run
    run_cfg_in_tmpdir(cfg, tmpdir)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py", line 208, in run_cfg_in_tmpdir
    run_bash(bash_template, myinputs, myoutputs, parameters)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/do_task.py", line 178, in run_bash
    util.system(cmd)
  File "/home/cbergman/miniconda3/envs/falcon/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 256.
+++ pwd
++ echo 'FAILURE. Running top in /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2/4-quiver/quiver-run/000000Fp01 (If you see -terminal database i
s inaccessible- you are using the python bin-wrapper, so you will not get diagnostic info. No big deal. This process is crashing anyway.)'
++ rm -f top.txt
++ which python
++ which top
++ env -u LD_LIBRARY_PATH top -b -n 1
++ env -u LD_LIBRARY_PATH top -b -n 1
++ pstree -apl

real    0m2.565s
user    0m0.387s
sys 0m0.196s
+ finish
+ echo 'finish code: 1'

The error appears to be related to a missing requirement of the avro package:

pkg_resources.DistributionNotFound: The 'avro' distribution was not found and is required by pbcommand

Any ideas why this avro dependency error is happening?

gconcepcion commented 6 years ago

Hmmm, interesting. The same bioconda environment works for me on a stock CentOS 6.6 & CentOS 7.2.1511 box so I doubt it's anything different about the OS that is causing the problem.

Can you run this command on both your working (CentOS6.6) box and your non-working (CentOS7) box:


(pb-assembly) $ python -c 'import avro; print avro.__file__'
/home/UNIXHOME/gconcepcion/.conda/envs/pb-assembly/lib/python2.7/site-packages/avro-1.8.0-py2.7.egg/avro/__init__.py```
cbergman commented 6 years ago
gconcepcion commented 6 years ago

Well, we know it's missing from that box, what does the analogous command say on the working box?

You might trying updating your pb-assembly environment - $ conda update pb-assembly

There have been several commits in the past 24 hours.

cbergman commented 6 years ago

Package Plan

environment location: /home/cbergman/miniconda3/envs/falcon

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
mummer4-4.0.0beta2         |  pl526hfc679d8_3         1.3 MB  bioconda
bedtools-2.27.1            |       he941832_2         713 KB  bioconda
pb-assembly-0.0.0          |           py27_7           4 KB  bioconda
bwa-0.7.17                 |       ha92aebf_3         508 KB  bioconda
pb-falcon-0.2.0            |           py27_0         469 KB  bioconda
------------------------------------------------------------
                                       Total:         3.0 MB

The following NEW packages will be INSTALLED:

bedtools:    2.27.1-he941832_2          bioconda
bwa:         0.7.17-ha92aebf_3          bioconda
mummer4:     4.0.0beta2-pl526hfc679d8_3 bioconda

The following packages will be UPDATED:

pb-assembly: 0.0.0-py27_3               bioconda --> 0.0.0-py27_7 bioconda
pb-falcon:   0.0.2-py27_0               bioconda --> 0.2.0-py27_0 bioconda

Proceed ([y]/n)? y

Downloading and Extracting Packages mummer4-4.0.0beta2 | 1.3 MB | ############################################################################################################### | 100% bedtools-2.27.1 | 713 KB | ############################################################################################################### | 100% pb-assembly-0.0.0 | 4 KB | ############################################################################################################### | 100% bwa-0.7.17 | 508 KB | ############################################################################################################### | 100% pb-falcon-0.2.0 | 469 KB | ############################################################################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done (falcon) [cbergman@sapelo2] $ python -c 'import avro; print avro.file' Traceback (most recent call last): File "", line 1, in ImportError: No module named avro

- reinstalled the pb-assembly package on the box that isn't working in a new environment

[cbergman@sapelo2] $ conda create -n falcon2 Solving environment: done

Package Plan

environment location: /home/cbergman/miniconda3/envs/falcon2

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done #

To activate this environment, use

#

$ conda activate falcon2

#

To deactivate an active environment, use

#

$ conda deactivate

[cbergman@sapelo2] $ source activate falcon2 (falcon2) [cbergman@sapelo2] $ conda install pb-assembly Solving environment: done

Package Plan

environment location: /home/cbergman/miniconda3/envs/falcon2

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
mkl-2019.0                 |              117       204.4 MB
intel-openmp-2019.0        |              117         721 KB
------------------------------------------------------------
                                       Total:       205.1 MB

The following NEW packages will be INSTALLED:

asn1crypto:              0.24.0-py27_0
avro:                    1.8.2-h8f0a498_0           conda-forge
bcftools:                1.9-h4da6232_0             bioconda
bedtools:                2.27.1-he941832_2          bioconda
blas:                    1.0-mkl
blasr:                   5.3.2-hac9d22c_3           bioconda
blasr_libcpp:            5.3.1-hac9d22c_2           bioconda
bwa:                     0.7.17-ha92aebf_3          bioconda
bzip2:                   1.0.6-h14c3975_5
ca-certificates:         2018.03.07-0
certifi:                 2018.8.24-py27_1
cffi:                    1.11.5-py27he75722e_1
chardet:                 3.0.4-py27_1
cryptography:            2.3.1-py27hc365091_0
curl:                    7.61.0-h84994c4_0
cython:                  0.28.5-py27hf484d3e_0
decorator:               4.3.0-py27_0
enum34:                  1.1.6-py27_1
future:                  0.16.0-py27_2
genomicconsensus:        2.3.2-py27_1               bioconda
h5py:                    2.8.0-py27h989c5e5_3
hdf5:                    1.10.2-hba1933b_1
htslib:                  1.7-0                      bioconda
idna:                    2.7-py27_0
intel-openmp:            2019.0-117
ipaddress:               1.0.22-py27_0
iso8601:                 0.1.12-py27_1
jansson:                 2.11-0                     conda-forge
libcurl:                 7.61.0-h1ad7b7a_0
libdeflate:              1.0-h470a237_0             bioconda
libedit:                 3.1.20170329-h6b74fdf_2
libffi:                  3.2.1-hd88cf55_4
libgcc:                  7.2.0-h69d50b8_2
libgcc-ng:               8.2.0-hdf63c60_1
libgfortran-ng:          7.3.0-hdf63c60_0
libssh2:                 1.8.0-h9cfc8f7_4
libstdcxx-ng:            8.2.0-hdf63c60_1
linecache2:              1.0.0-py27_0
minimap2:                2.12-ha92aebf_0            bioconda
mkl:                     2019.0-117
mkl_fft:                 1.0.4-py27h4414c95_1
mkl_random:              1.0.1-py27h4414c95_1
mummer4:                 4.0.0beta2-pl526hfc679d8_3 bioconda
ncurses:                 6.1-hf484d3e_0
networkx:                2.1-py27_0
nim-falcon:              0.0.0-0                    bioconda
numpy:                   1.15.1-py27h1d66e8a_0
numpy-base:              1.15.1-py27h81de0dd_0
openssl:                 1.0.2p-h14c3975_0
pb-assembly:             0.0.0-py27_7               bioconda
pb-dazzler:              0.0.0-h470a237_0           bioconda
pb-falcon:               0.2.0-py27_0               bioconda
pbalign:                 0.3.1-py27_0               bioconda
pbbam:                   0.18.0-h1310cd9_1          bioconda
pbcommand:               1.1.1-py27h24bf2e0_1       bioconda
pbcore:                  1.5.1-py27_1               bioconda
perl:                    5.26.2-h14c3975_0
pip:                     10.0.1-py27_0
pycparser:               2.18-py27_1
pyopenssl:               18.0.0-py27_0
pysam:                   0.14.1-py27hae42fb6_1      bioconda
pysocks:                 1.6.8-py27_0
python:                  2.7.15-h1571d57_0
python-consensuscore:    1.1.1-py27h02d93b8_1       bioconda
python-consensuscore2:   3.1.0-py27_1               bioconda
python-edlib:            1.2.3-py27h470a237_1       bioconda
python-intervaltree:     2.1.0-py_0                 bioconda
python-msgpack:          0.5.6-py27h470a237_0       bioconda
python-sortedcontainers: 2.0.4-py_0                 bioconda
pytz:                    2018.5-py27_0
readline:                7.0-h7b6447c_5
requests:                2.19.1-py27_0
samtools:                1.9-h8ee4bcc_1             bioconda
setuptools:              40.2.0-py27_0
six:                     1.11.0-py27_1
snappy:                  1.1.7-hbae5bb6_3
sqlite:                  3.24.0-h84994c4_0
tk:                      8.6.8-hbc83047_0
traceback2:              1.4.0-py27_0
unittest2:               1.1.0-py27_0
urllib3:                 1.23-py27_0
wheel:                   0.31.1-py27_0
xz:                      5.2.4-h14c3975_4
zlib:                    1.2.11-ha838bed_2

Proceed ([y]/n)? y

Downloading and Extracting Packages mkl-2019.0 | 204.4 MB | ############################################################################################################### | 100% intel-openmp-2019.0 | 721 KB | ############################################################################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: / ##############################################################################

PacBio(R) tools distributed via Bioconda are: pre-release versions, not

necessarily ISO compliant, intended for Research Use Only and not for use

in diagnostic procedures, intended only for command-line users, and

possibly newer than the currently available SMRT(R) Analysis builds. While

efforts have been made to ensure that releases on Bioconda live up to the

quality that PacBio strives for, we make no warranty regarding any

Bioconda release.

As PacBio tools distributed via Bioconda are not covered by any service

level agreement or the like, please do not contact a PacBio Field

Applications Scientist or PacBio Customer Service for assistance with any

Bioconda release. We instead provide an issue tracker for you to report

issues to us at:

https://github.com/PacificBiosciences/pbbioconda

We make no warranty that any such issue will be addressed,

to any extent or within any time frame.

BSD 3-Clause Clear License

Please see https://github.com/PacificBiosciences/pbbioconda for

information on License, Copyright and Disclaimer

##############################################################################

\ ##############################################################################

PacBio(R) tools distributed via Bioconda are: pre-release versions, not

necessarily ISO compliant, intended for Research Use Only and not for use

in diagnostic procedures, intended only for command-line users, and

possibly newer than the currently available SMRT(R) Analysis builds. While

efforts have been made to ensure that releases on Bioconda live up to the

quality that PacBio strives for, we make no warranty regarding any

Bioconda release.

As PacBio tools distributed via Bioconda are not covered by any service

level agreement or the like, please do not contact a PacBio Field

Applications Scientist or PacBio Customer Service for assistance with any

Bioconda release. We instead provide an issue tracker for you to report

issues to us at:

https://github.com/PacificBiosciences/pbbioconda

We make no warranty that any such issue will be addressed,

to any extent or within any time frame.

BSD 3-Clause Clear License

Please see https://github.com/PacificBiosciences/pbbioconda for

information on License, Copyright and Disclaimer

##############################################################################

##############################################################################

PacBio(R) tools distributed via Bioconda are: pre-release versions, not

necessarily ISO compliant, intended for Research Use Only and not for use

in diagnostic procedures, intended only for command-line users, and

possibly newer than the currently available SMRT(R) Analysis builds. While

efforts have been made to ensure that releases on Bioconda live up to the

quality that PacBio strives for, we make no warranty regarding any

Bioconda release.

As PacBio tools distributed via Bioconda are not covered by any service

level agreement or the like, please do not contact a PacBio Field

Applications Scientist or PacBio Customer Service for assistance with any

Bioconda release. We instead provide an issue tracker for you to report

issues to us at:

https://github.com/PacificBiosciences/pbbioconda

We make no warranty that any such issue will be addressed,

to any extent or within any time frame.

BSD 3-Clause Clear License

Please see https://github.com/PacificBiosciences/pbbioconda for

information on License, Copyright and Disclaimer

##############################################################################

##############################################################################

PacBio(R) tools distributed via Bioconda are: pre-release versions, not

necessarily ISO compliant, intended for Research Use Only and not for use

in diagnostic procedures, intended only for command-line users, and

possibly newer than the currently available SMRT(R) Analysis builds. While

efforts have been made to ensure that releases on Bioconda live up to the

quality that PacBio strives for, we make no warranty regarding any

Bioconda release.

As PacBio tools distributed via Bioconda are not covered by any service

level agreement or the like, please do not contact a PacBio Field

Applications Scientist or PacBio Customer Service for assistance with any

Bioconda release. We instead provide an issue tracker for you to report

issues to us at:

https://github.com/PacificBiosciences/pbbioconda

We make no warranty that any such issue will be addressed,

to any extent or within any time frame.

BSD 3-Clause Clear License

Please see https://github.com/PacificBiosciences/pbbioconda for

information on License, Copyright and Disclaimer

##############################################################################

| ##############################################################################

PacBio(R) tools distributed via Bioconda are: pre-release versions, not

necessarily ISO compliant, intended for Research Use Only and not for use

in diagnostic procedures, intended only for command-line users, and

possibly newer than the currently available SMRT(R) Analysis builds. While

efforts have been made to ensure that releases on Bioconda live up to the

quality that PacBio strives for, we make no warranty regarding any

Bioconda release.

As PacBio tools distributed via Bioconda are not covered by any service

level agreement or the like, please do not contact a PacBio Field

Applications Scientist or PacBio Customer Service for assistance with any

Bioconda release. We instead provide an issue tracker for you to report

issues to us at:

https://github.com/PacificBiosciences/pbbioconda

We make no warranty that any such issue will be addressed,

to any extent or within any time frame.

BSD 3-Clause Clear License

Please see https://github.com/PacificBiosciences/pbbioconda for

information on License, Copyright and Disclaimer

##############################################################################

done (falcon2) [cbergman@sapelo2] $ pwd /lustre1/cbergman/falcon/FALCON-examples/run/greg200k-sv2 (falcon2) [cbergman@sapelo2] $ python -c 'import avro; print avro.file' Traceback (most recent call last): File "", line 1, in ImportError: No module named avro

- python still can't import avro
- noticed that avro appeared to install correctly and that version of avro on box that is working and box that isn't working differ:

not working

ls -lrt ~/miniconda3/pkgs/avro-1.8.2-h8f0a498_0

working

ls -lrt ~/miniconda3/pkgs/avro-1.8.0-py27_0

- downgraded avro to 1.8.0 on the box that isn't working and it now can be found by python in the pb-assembly environment

(falcon2) [cbergman@sapelo2] $ conda install avro=1.8.0 Solving environment: done

Package Plan

environment location: /home/cbergman/miniconda3/envs/falcon2

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
avro-1.8.0                 |           py27_0          74 KB  bioconda

The following packages will be DOWNGRADED:

avro: 1.8.2-h8f0a498_0 conda-forge --> 1.8.0-py27_0 bioconda

Proceed ([y]/n)? y

Downloading and Extracting Packages avro-1.8.0 | 74 KB | ############################################################################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done (falcon2) [cbergman@sapelo2] $ python -c 'import avro; print avro.file' /home/cbergman/miniconda3/envs/falcon2/lib/python2.7/site-packages/avro-1.8.0-py2.7.egg/avro/init.py (falcon2) [cbergman@sapelo2] $


- after downgrading to avro-1.8.0, unzip completes on the test data on the box that previously wasn't working.
- conclusion: unzip apparently only currently works with avro-1.8.0 but not avro-1.8.2. Can you confirm and possibly update the conda recipe to rely on avro-1.8.0?
gconcepcion commented 6 years ago

Interesting - before I was running my CentOS6 build on both CentOS6 & CentOS7 with no problems. I decided to try installing a copy of pb-assembly directly on a CentOS7.2.1511 box and it works fine for me and installs avro==1.8.0

I'm not sure why the recipe is installing version 1.8.2 for you.

(pb-assembly_centos7) mp1803-sge:~$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core) 
Release:        7.2.1511
Codename:       Core
(pb-assembly_centos7) mp1803-sge:~$ pip freeze | grep avro
avro==1.8.0
(pb-assembly_centos7) mp1803-sge:~$ python -c 'import avro; print avro.__file__'
/home/UNIXHOME/gconcepcion/.conda/envs/pb-assembly_centos7/lib/python2.7/site-packages/avro-1.8.0-py2.7.egg/avro/__init__.py

Is your CentOS7 box stock? or have you made changes to it? this might be a legitimate bug; in which case you should post in the official pbbioconda/issues repo here

cbergman commented 6 years ago

The Centos 7 system is our HPC cluster, so I can only assume that it is not a vanilla install. But I also don't think this avro version issue is related to the system per se. Rather, it is likely a conda issue, with conda behaving unreliably on different systems. I'd suggest requiring the pb-assembly recipe to explicitly install the avro==1.8.0 version, rather then installing a non-specific version of avro and relying on conda/system interactions to hopefully install the right one. I'm happy for you or @pb-cdunn to file this on pbbioconda and take it forward, but since my install is working on my systems, I'm good for now and going to close this issue. Thanks for your help trouble shooting.

gconcepcion commented 6 years ago

This is great feedback, I really appreciate the time you took to be thorough. We will add some explicit rules to ensure avro==1.8.0 is the version being installed.

Thanks!

pb-cdunn commented 5 years ago

https://circleci.com/gh/bioconda/bioconda-recipes/40724?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {'avro-python2==1.8.0'}

I don't know whether we can use 1.8.0 anymore. I see 1.8.2 in my environment. Any idea how to get around this?