Open dgordon562 opened 8 years ago
I've repeated 4x and gotten various #s of job* but always < 300.
However, the pypflow.log shows:
head -52997 pypeflow.log | grep "ready: task" | wc 5022 45198 497178
which is exactly the right number. So I need to dig further so the info above (which creates the job directories) is crucial.
A job directory is not created until the corresponding pypeflow-task is selected for distribution. I'm not sure why you have exactly 100, but I suspect that's your setting for pa_concurrent_jobs
.
Are you saying that falcon has completed successfully, but only 100 PA jobs were run?
I never let falcon continue when I see there are only 100 job_ directories.
I don't know what "selected for distribution" means... Are you saying that the job directory is not created until the rj* job is qsub'd? If so, then this is all a false alarm... But I don't remember that being the case....
Just to be clear: my understanding was that if runjobs.sh said there were 5000 daligner jobs, then all 5000 job* directories were created before even the 1st daligner job was qsub'd. Is that the case?
No, I don't think so. The code which sets up the job is part of the pypeflow task. E.g. https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/mains/run.py#L267
def task_run_daligner(self):
...
support.run_daligner(**args)
run_script_and_wait_and_rm_exit(...)
https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/run_support.py#L436
def run_daligner(daligner_script, db_prefix, config, job_done, script_fn):
if config['use_tmpdir']:
# Really, we want to copy the symlinked db to tmpdir.
# The output is fine in NFS.
# Tricky. TODO.
logger.warning('use_tmpdir currently ignored')
bash.write_script_and_wrapper(daligner_script, script_fn, job_done)
The content of the script is created in code that runs before the pypeflow task is started, in https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/mains/run.py#L359
def create_daligner_tasks(run_jobs_fn, wd, db_prefix, rdb_build_done, config, pread_aln=False):
...
for job_uid, script in bash.scripts_daligner(run_jobs_fn, db_prefix, rdb_build_done, pread_aln):
https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/bash.py#L166
def scripts_daligner(run_jobs_fn, db_prefix, rdb_build_done, pread_aln=False):
...
bash = """
db_dir={db_dir}
ln -sf ${{db_dir}}/.{db_prefix}.bps .
ln -sf ${{db_dir}}/.{db_prefix}.idx .
ln -sf ${{db_dir}}/{db_prefix}.db .
ln -sf ${{db_dir}}/.{db_prefix}.dust.anno .
ln -sf ${{db_dir}}/.{db_prefix}.dust.data .
{daligner_cmd}
#rm -f *.C?.las
#rm -f *.N?.las
"""
Thanks! Very helpful! More for the documentation...
Hi, Chris,
Behold:
however:
I thought perhaps not enough inodes, so I cleaned up and tried again. Same result...
To debug this, I want to log whenever fc_run.py creates a directory. Where is this?
In create_daligner_tasks, there are calls to each of:
Which of these (if any) actually creates the job_ directory?
Thanks! David