PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
204 stars 103 forks source link

Exception: Some tasks are recently_done but not satisfied: {Node(0-rawreads/repa/rep-combine)} #708

Closed shannonekj closed 3 years ago

shannonekj commented 4 years ago

Hi all,

I am running FALCON on a cluster with slurm and have come into a few errors when trying to submit the fc_run.sh script and let the assembly go. I will briefly describe the first, as it may be relevant to why I am getting the errors now... but obviously I am uncertain. I believe it is the same error at #707

The first exception that led to my job failing was an issue with the mypwatcher/wrappers/*.bash not being executable and slurm failing to enqueue the jobs. My solution was to manually make all .bash files executable with chmod a+x mypwatcher/wrappers/*.bash I kept doing this every time a job failed and it seemingly proceeded okay for the first 10-15 times I had to resubmit (the output files all had text and things went proceeded!). Then I got tired of having to chmod every file so I tried to look up a solution to automatically making any file in the mypwatcher/wrappers/ directory executable and ran chmod -R 775 mypwatcher/wrappers/ (I'm not sure if this would affect anything that I'm currently seeing becuase it looks like the wrappers directory has the same permissions [drwxrwsr-x] as the other directories in mypwatcher/ but it is a command I ran and may be relevant). A few minutes after I ran that command I received a different error in my all.logand err files:

NOTE I found a typo in my cfg file (I switched pwatch_type = blocking to pwatcher_type = blocking) and have attached the updated cfg, err and all.log below as my scripts are still failing. I left the originals up for reference purpose--hope that is okay.

all.log

fc_run.j20743460.err

+ cd /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox
+ fc_run.py fc_run.cfg
falcon-kit 1.8.1 (pip thinks "falcon-kit 1.8.1")
pypeflow 2.3.0
[INFO]Setup logging from file "None".
[INFO]$ lfs setstripe -c 12 /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox >
[INFO]Apparently '/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox' is not in lustre filesystem, which is fine.
[INFO]fc_run started with configuration fc_run.cfg
[WARNING]You have several old-style options. These should be provided in the `[job.defaults]` or `[job.step.*]` sections, and possibly renamed. See https://github.com/PacificBiosciences/FALCON/wiki/Configuration
 ['cns_concurrent_jobs', 'default_concurrent_jobs']
[WARNING]Unexpected keys in input config: {'falcon_greedy', 'cns_concurrent_jobs', 'ovlp_concurrent_jobs', 'pa_concurrent_jobs', 'default_concurrent_jobs'}
[INFO]cfg=
{
  "General": {
    "LA4Falcon_preload": false,
    "avoid_text_file_busy": true,
    "bestn": 12,
    "cns_concurrent_jobs": "288",
    "dazcon": false,
    "default_concurrent_jobs": "288",
    "falcon_greedy": "False",
    "falcon_sense_greedy": false,
    "falcon_sense_option": "--output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24",
    "falcon_sense_skip_contained": false,
    "fc_ovlp_to_graph_option": " --min-len 10000",
    "genome_size": "900000000",
    "input_fofn": "subreads.fa.fofn",
    "input_type": "raw",
    "length_cutoff": "-1",
    "length_cutoff_pr": "10000",
    "overlap_filtering_setting": "--max-diff 120 --max-cov 120 --min-cov 2 --n-core 12",
    "ovlp_DBdust_option": "",
    "ovlp_DBsplit_option": "-s400",
    "ovlp_HPCdaligner_option": "-v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100",
    "ovlp_concurrent_jobs": "288",
    "ovlp_daligner_option": "-k24 -h600 -e.95 -l1800 -s100",
    "pa_DBdust_option": "",
    "pa_DBsplit_option": "-x500 -s400",
    "pa_HPCREPmask_option": "-k18 -h480 -w8 -e.8 -s100",
    "pa_HPCTANmask_option": "-k18 -h480 -w8 -e.8 -s100",
    "pa_HPCdaligner_option": "-v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16",
    "pa_REPmask_code": "0,300/0,300/0,300",
    "pa_concurrent_jobs": "288",
    "pa_daligner_option": "-e0.75 -l1200 -k14 -h256 -w8 -s100",
    "pa_dazcon_option": "-j 4 -x -l 500",
    "pa_fasta_filter_option": "streamed-internal-median",
    "pa_subsample_coverage": 0,
    "pa_subsample_random_seed": 12345,
    "pa_subsample_strategy": "random",
    "seed_coverage": "40",
    "skip_checks": false,
    "target": "assembly"
  },
  "job.defaults": {
    "JOB_QUEUE": "default",
    "MB": "40000",
    "NPROC": "12",
    "job_type": "slurm",
    "njobs": "100",
    "pwatch_type": "blocking",
    "pwatcher_type": "fs_based",
    "submit": "srun --wait=0 -p high \\\n-J ${JOB_NAME} \\\n-o ${JOB_STDOUT} \\\n-e ${JOB_STDERR} \\\n--mem-per-cpu=${MB}M \\\n--cpus-per-task=${NPROC} \\\n--time=4-0 \\\n--ntasks 1 \\\n--exclusive \\\n${JOB_SCRIPT}\n\"${CMD}\"",
    "use_tmpdir": false
  },
  "job.step.asm": {
    "NPROC": "24"
  },
  "job.step.cns": {
    "MB": "64000",
    "NPROC": "8",
    "njobs": "200"
  },
  "job.step.da": {
    "MB": "32000",
    "NPROC": "4",
    "njobs": "300"
  },
  "job.step.dust": {},
  "job.step.la": {
    "MB": "64000",
    "NPROC": "8",
    "njobs": "200"
  },
  "job.step.pda": {
    "MB": "64000",
    "NPROC": "8",
    "njobs": "200"
  },
  "job.step.pla": {
    "MB": "32000",
    "NPROC": "4",
    "njobs": "300"
  }
}
[INFO]In simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from '/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pwatcher/fs_based.py'>
[INFO]job_type='slurm', (default)job_defaults={'job_type': 'slurm', 'pwatch_type': 'blocking', 'JOB_QUEUE': 'default', 'MB': '40000', 'NPROC': '12', 'njobs': '100', 'submit': 'srun --wait=0 -p high \\\n-J ${JOB_NAME} \\\n-o ${JOB_STDOUT} \\\n-e ${JOB_STDERR} \\\n--mem-per-cpu=${MB}M \\\n--cpus-per-task=${NPROC} \\\n--time=4-0 \\\n--ntasks 1 \\\n--exclusive \\\n${JOB_SCRIPT}\n"${CMD}"', 'pwatcher_type': 'fs_based', 'use_tmpdir': False}, use_tmpdir=False, squash=False, job_name_style=0
[INFO]Setting max_jobs to 100; was None
[INFO]Num unsatisfied: 0, graph: 2
[INFO]Setting max_jobs to 300; was 100
[INFO]Num unsatisfied: 0, graph: 81
[INFO]Setting max_jobs to 100; was 300
[INFO]Parsed pa_REPmask_code (repa,repb,repc): [(0, 300), (0, 300), (0, 300)]
[INFO]Num unsatisfied: 0, graph: 83
[INFO]Setting max_jobs to 300; was 100
[INFO]Num unsatisfied: 0, graph: 86
[INFO]Setting max_jobs to 100; was 300
[INFO]Num unsatisfied: 0, graph: 88
[INFO]Setting max_jobs to 200; was 100
[INFO]Num unsatisfied: 0, graph: 399
[INFO]Setting max_jobs to 100; was 200
[INFO]Num unsatisfied: 0, graph: 401
[INFO]Setting max_jobs to 300; was 100
[INFO]Num unsatisfied: 0, graph: 712
[INFO]Setting max_jobs to 100; was 300
[INFO]Num unsatisfied: 2, graph: 714
[INFO]About to submit: Node(0-rawreads/repa/rep-combine)
[INFO] starting job Job(jobid='Pf571d3d49d8d4a', cmd='/bin/bash run.sh', rundir='/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/0-rawreads/repa/rep-combine', options={'job_type': 'local', 'pwatch_type': 'blocking', 'JOB_QUEUE': 'default', 'MB': 4000, 'NPROC': 1, 'njobs': '100', 'submit': 'srun --wait=0 -p high \\\n-J ${JOB_NAME} \\\n-o ${JOB_STDOUT} \\\n-e ${JOB_STDERR} \\\n--mem-per-cpu=${MB}M \\\n--cpus-per-task=${NPROC} \\\n--time=4-0 \\\n--ntasks 1 \\\n--exclusive \\\n${JOB_SCRIPT}\n"${CMD}"', 'pwatcher_type': 'fs_based', 'use_tmpdir': False}) w/ job_type=LOCAL
[INFO]dir: '/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/mypwatcher/jobs/Pf571d3d49d8d4a'
CALL:
 '/bin/bash /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/mypwatcher/wrappers/run-Pf571d3d49d8d4a.bash 1>|stdout 2>|stderr & '
[INFO]pid=29292 pgid=29225 sub-pid=29339
[INFO]Submitted backgroundjob=MetaJobLocal(MetaJob(job=Job(jobid='Pf571d3d49d8d4a', cmd='/bin/bash run.sh', rundir='/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/0-rawreads/repa/rep-combine', options={'job_type': 'local', 'pwatch_type': 'blocking', 'JOB_QUEUE': 'default', 'MB': 4000, 'NPROC': 1, 'njobs': '100', 'submit': 'srun --wait=0 -p high \\\n-J ${JOB_NAME} \\\n-o ${JOB_STDOUT} \\\n-e ${JOB_STDERR} \\\n--mem-per-cpu=${MB}M \\\n--cpus-per-task=${NPROC} \\\n--time=4-0 \\\n--ntasks 1 \\\n--exclusive \\\n${JOB_SCRIPT}\n"${CMD}"', 'pwatcher_type': 'fs_based', 'use_tmpdir': False}), lang_exe='/bin/bash'))
[ERROR]Task Node(0-rawreads/repa/rep-combine) failed with exit-code=256
[ERROR]Some tasks are recently_done but not satisfied: {Node(0-rawreads/repa/rep-combine)}
[ERROR]ready: set()
    submitted: set()
[ERROR]Failed to kill job for heartbeat 'heartbeat-Pf571d3d49d8d4a' (which might mean it was already gone): FileNotFoundError(2, 'No such file or directory')
Traceback (most recent call last):
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 278, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 362, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: {Node(0-rawreads/repa/rep-combine)}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pwatcher/fs_based.py", line 611, in delete_heartbeat
    bjob.kill(state, heartbeat)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pwatcher/fs_based.py", line 273, in kill
    with open(heartbeat_fn) as ifs:
FileNotFoundError: [Errno 2] No such file or directory: '/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/mypwatcher/heartbeats/heartbeat-Pf571d3d49d8d4a'
Traceback (most recent call last):
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/bin/fc_run.py", line 11, in <module>
    load_entry_point('falcon-kit==1.8.1', 'console_scripts', 'fc_run.py')()
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 706, in main
    main1(argv[0], args.config, args.logger)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 73, in main1
    input_fofn_fn=input_fofn_fn,
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 269, in run
    letter, group_size, coverage_limit)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 627, in add_rep_tasks
    daligner_split_script=pype_tasks.TASK_DB_REP_DALIGNER_SPLIT_SCRIPT,
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 524, in add_daligner_and_merge_tasks
    dist=Dist(NPROC=4, MB=4000, job_dict=daligner_job_config),
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/pype.py", line 106, in gen_parallel_tasks
    wf.refreshTargets()
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 278, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 362, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: {Node(0-rawreads/repa/rep-combine)}

fc_run.cfg

[General]
input_type = raw
input_fofn = subreads.fa.fofn
#use_tmpdir = scratch

# length cutoff used for seed reads used for initial mapping (default length was 5000, -1 means determine from genome size and seed coverage)
genome_size = 900000000
seed_coverage = 40
length_cutoff = -1
# length cutoff used for seed reads used for pre-assembly
length_cutoff_pr = 10000

falcon_greedy = False
falcon_sense_greedy=False

# concurrency setting
default_concurrent_jobs = 288
pa_concurrent_jobs = 288
cns_concurrent_jobs = 288
ovlp_concurrent_jobs = 288

# overlapping options for Daligner
pa_HPCdaligner_option =  -v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16
ovlp_HPCdaligner_option = -v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100
pa_daligner_option = -e0.75 -l1200 -k14 -h256 -w8 -s100
ovlp_daligner_option = -k24 -h600 -e.95 -l1800 -s100
pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100
pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100
#pa_REPmask_code=1,20;10,15;50,10
pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -s400

# error correction consensus option
falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24

# overlap filtering options
overlap_filtering_setting = --max-diff 120 --max-cov 120 --min-cov 2 --n-core 12

[job.defaults]
job_type = slurm
pwatch_type = blocking
JOB_QUEUE = default
MB = 40000
NPROC = 12
njobs = 100
submit = srun --wait=0 -p high \
    -J ${JOB_NAME} \
    -o ${JOB_STDOUT} \
    -e ${JOB_STDERR} \
    --mem-per-cpu=${MB}M \
    --cpus-per-task=${NPROC} \
    --time=4-0 \
    --ntasks 1 \
    --exclusive \
    ${JOB_SCRIPT}
    "${CMD}"

[job.step.da]
NPROC=4
MB=32000
njobs=300
[job.step.la]
NPROC=8
MB=64000
njobs=200
[job.step.cns]
NPROC=8
MB=64000
njobs=200
[job.step.pda]
NPROC=8
MB=64000
njobs=200
[job.step.pla]
NPROC=4
MB=32000
njobs=300
[job.step.asm]
NPROC=24

I see in the err file that despite me having pwatch_type = blocking in my cfg file it still calls on fs_based (see line 61 of err file) –– could this account for any of my issues?? If so, how might I correct??

Thank you so much for your assistance!

Shannon

shannonekj commented 4 years ago

Sorry for such a quick update but the thought occurred to me that my cfg file may have a typo on the line that calls for blocking. I have updated the fc_run.cfg file to the following:

fc_run.cfg

[General]
input_type = raw
input_fofn = subreads.fa.fofn
#use_tmpdir = scratch

# length cutoff used for seed reads used for initial mapping (default length was 5000, -1 means determine from genome size and seed coverage)
genome_size = 900000000
seed_coverage = 40
length_cutoff = -1
# length cutoff used for seed reads used for pre-assembly
length_cutoff_pr = 10000

falcon_greedy = False
falcon_sense_greedy=False

# concurrency setting
default_concurrent_jobs = 288
pa_concurrent_jobs = 288
cns_concurrent_jobs = 288
ovlp_concurrent_jobs = 288

# overlapping options for Daligner
pa_HPCdaligner_option =  -v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16
ovlp_HPCdaligner_option = -v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100
pa_daligner_option = -e0.75 -l1200 -k14 -h256 -w8 -s100
ovlp_daligner_option = -k24 -h600 -e.95 -l1800 -s100
pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100
pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100
#pa_REPmask_code=1,20;10,15;50,10
pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -s400

# error correction consensus option
falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24

# overlap filtering options
overlap_filtering_setting = --max-diff 120 --max-cov 120 --min-cov 2 --n-core 12

# slurm options (says sge but not for rEaLz)
#sge_option_da = -pe smp 5 -q bigmem
#sge_option_la = -pe smp 20 -q bigmem
#sge_option_pda = -pe smp 6 -q bigmem
#sge_option_pla = -pe smp 16 -q bigmem
#sge_option_fc = -pe smp 24 -q bigmem
#sge_option_cns = -pe smp 8 -q bigmem

[job.defaults]
job_type = slurm
pwatcher_type = blocking
JOB_QUEUE = default
MB = 40000
NPROC = 12
njobs = 100
submit = srun --wait=0 -p high \
    -J ${JOB_NAME} \
    -o ${JOB_STDOUT} \
    -e ${JOB_STDERR} \
    --mem-per-cpu=${MB}M \
    --cpus-per-task=${NPROC} \
    --time=4-0 \
    --ntasks 1 \
    --exclusive \
    ${JOB_SCRIPT}
    "${CMD}"

[job.step.da]
NPROC=4
MB=32000
njobs=300
[job.step.la]
NPROC=8
MB=64000
njobs=200
[job.step.cns]
NPROC=8
MB=64000
njobs=200
[job.step.pda]
NPROC=8
MB=64000
njobs=200
[job.step.pla]
NPROC=4
MB=32000
njobs=300
[job.step.asm]
NPROC=24

And now I get the following err file:

+ cd /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox
+ fc_run.py fc_run.cfg
falcon-kit 1.8.1 (pip thinks "falcon-kit 1.8.1")
pypeflow 2.3.0
[INFO]Setup logging from file "None".
[INFO]$ lfs setstripe -c 12 /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox >
[INFO]Apparently '/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox' is not in lustre filesystem, which is fine.
[INFO]fc_run started with configuration fc_run.cfg
[WARNING]You have several old-style options. These should be provided in the `[job.defaults]` or `[job.step.*]` sections, and possibly renamed. See https://github.com/PacificBiosciences/FALCON/wiki/Configuration
 ['cns_concurrent_jobs', 'default_concurrent_jobs']
[WARNING]Unexpected keys in input config: {'pa_concurrent_jobs', 'default_concurrent_jobs', 'cns_concurrent_jobs', 'ovlp_concurrent_jobs', 'falcon_greedy'}
[INFO]cfg=
{
  "General": {
    "LA4Falcon_preload": false,
    "avoid_text_file_busy": true,
    "bestn": 12,
    "cns_concurrent_jobs": "288",
    "dazcon": false,
    "default_concurrent_jobs": "288",
    "falcon_greedy": "False",
    "falcon_sense_greedy": false,
    "falcon_sense_option": "--output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24",
    "falcon_sense_skip_contained": false,
    "fc_ovlp_to_graph_option": " --min-len 10000",
    "genome_size": "900000000",
    "input_fofn": "subreads.fa.fofn",
    "input_type": "raw",
    "length_cutoff": "-1",
    "length_cutoff_pr": "10000",
    "overlap_filtering_setting": "--max-diff 120 --max-cov 120 --min-cov 2 --n-core 12",
    "ovlp_DBdust_option": "",
    "ovlp_DBsplit_option": "-s400",
    "ovlp_HPCdaligner_option": "-v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100",
    "ovlp_concurrent_jobs": "288",
    "ovlp_daligner_option": "-k24 -h600 -e.95 -l1800 -s100",
    "pa_DBdust_option": "",
    "pa_DBsplit_option": "-x500 -s400",
    "pa_HPCREPmask_option": "-k18 -h480 -w8 -e.8 -s100",
    "pa_HPCTANmask_option": "-k18 -h480 -w8 -e.8 -s100",
    "pa_HPCdaligner_option": "-v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16",
    "pa_REPmask_code": "0,300/0,300/0,300",
    "pa_concurrent_jobs": "288",
    "pa_daligner_option": "-e0.75 -l1200 -k14 -h256 -w8 -s100",
    "pa_dazcon_option": "-j 4 -x -l 500",
    "pa_fasta_filter_option": "streamed-internal-median",
    "pa_subsample_coverage": 0,
    "pa_subsample_random_seed": 12345,
    "pa_subsample_strategy": "random",
    "seed_coverage": "40",
    "skip_checks": false,
    "target": "assembly"
  },
  "job.defaults": {
    "JOB_QUEUE": "default",
    "MB": "40000",
    "NPROC": "12",
    "job_type": "slurm",
    "njobs": "100",
    "pwatcher_type": "blocking",
    "submit": "srun --wait=0 -p high \\\n-J ${JOB_NAME} \\\n-o ${JOB_STDOUT} \\\n-e ${JOB_STDERR} \\\n--mem-per-cpu=${MB}M \\\n--cpus-per-task=${NPROC} \\\n--time=4-0 \\\n--ntasks 1 \\\n--exclusive \\\n${JOB_SCRIPT}\n\"${CMD}\"",
    "use_tmpdir": false
  },
  "job.step.asm": {
    "NPROC": "24"
  },
  "job.step.cns": {
    "MB": "64000",
    "NPROC": "8",
    "njobs": "200"
  },
  "job.step.da": {
    "MB": "32000",
    "NPROC": "4",
    "njobs": "300"
  },
  "job.step.dust": {},
  "job.step.la": {
    "MB": "64000",
    "NPROC": "8",
    "njobs": "200"
  },
  "job.step.pda": {
    "MB": "64000",
    "NPROC": "8",
    "njobs": "200"
  },
  "job.step.pla": {
    "MB": "32000",
    "NPROC": "4",
    "njobs": "300"
  }
}
[INFO]In simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.blocking' from '/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pwatcher/blocking.py'>
[INFO]job_type='slurm', (default)job_defaults={'job_type': 'slurm', 'pwatcher_type': 'blocking', 'JOB_QUEUE': 'default', 'MB': '40000', 'NPROC': '12', 'njobs': '100', 'submit': 'srun --wait=0 -p high \\\n-J ${JOB_NAME} \\\n-o ${JOB_STDOUT} \\\n-e ${JOB_STDERR} \\\n--mem-per-cpu=${MB}M \\\n--cpus-per-task=${NPROC} \\\n--time=4-0 \\\n--ntasks 1 \\\n--exclusive \\\n${JOB_SCRIPT}\n"${CMD}"', 'use_tmpdir': False}, use_tmpdir=False, squash=False, job_name_style=0
[INFO]Setting max_jobs to 100; was None
[INFO]Num unsatisfied: 0, graph: 2
[INFO]Setting max_jobs to 300; was 100
[INFO]Num unsatisfied: 0, graph: 81
[INFO]Setting max_jobs to 100; was 300
[INFO]Parsed pa_REPmask_code (repa,repb,repc): [(0, 300), (0, 300), (0, 300)]
[INFO]Num unsatisfied: 0, graph: 83
[INFO]Setting max_jobs to 300; was 100
[INFO]Num unsatisfied: 0, graph: 86
[INFO]Setting max_jobs to 100; was 300
[INFO]Num unsatisfied: 0, graph: 88
[INFO]Setting max_jobs to 200; was 100
[INFO]Num unsatisfied: 0, graph: 399
[INFO]Setting max_jobs to 100; was 200
[INFO]Num unsatisfied: 0, graph: 401
[INFO]Setting max_jobs to 300; was 100
[INFO]Num unsatisfied: 0, graph: 712
[INFO]Setting max_jobs to 100; was 300
[INFO]Num unsatisfied: 2, graph: 714
[INFO]About to submit: Node(0-rawreads/repa/rep-combine)
[INFO]Popen: '/bin/bash -C /home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pwatcher/mains/job_start.sh >| /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/0-rawreads/repa/rep-combine/run-Pf571d3d49d8d4a.bash.stdout 2>| /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox/0-rawreads/repa/rep-combine/run-Pf571d3d49d8d4a.bash.stderr'
[INFO](slept for another 0.0s -- another 1 loop iterations)
[INFO](slept for another 0.30000000000000004s -- another 2 loop iterations)
[INFO](slept for another 1.2000000000000002s -- another 3 loop iterations)
[ERROR]Task Node(0-rawreads/repa/rep-combine) failed with exit-code=1
[ERROR]Some tasks are recently_done but not satisfied: {Node(0-rawreads/repa/rep-combine)}
[ERROR]ready: set()
    submitted: set()
[ERROR]Noop. We cannot kill blocked threads. Hopefully, everything will die on SIGTERM.
Traceback (most recent call last):
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/bin/fc_run.py", line 11, in <module>
    load_entry_point('falcon-kit==1.8.1', 'console_scripts', 'fc_run.py')()
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 706, in main
    main1(argv[0], args.config, args.logger)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 73, in main1
    input_fofn_fn=input_fofn_fn,
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 269, in run
    letter, group_size, coverage_limit)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 627, in add_rep_tasks
    daligner_split_script=pype_tasks.TASK_DB_REP_DALIGNER_SPLIT_SCRIPT,
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 524, in add_daligner_and_merge_tasks
    dist=Dist(NPROC=4, MB=4000, job_dict=daligner_job_config),
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/falcon_kit/pype.py", line 106, in gen_parallel_tasks
    wf.refreshTargets()
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 278, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/home/sejoslin/miniconda3/envs/asm_pacbio/lib/python3.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 362, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: {Node(0-rawreads/repa/rep-combine)}

and here is my all.log all.log

pb-cdunn commented 3 years ago

You'd have to look at stderr under 0-rawreads/repa/rep-combine.