only 100 job directories created for 5022 job assembly

dgordon562 commented 8 years ago

Hi, Chris,

Behold:

> grep "^daligner" run_jobs.sh | wc
   5022  623205 8255675
{dgordon}e217:/net/eichler/vol26/15000/whole_genome_assembly/nobackups/chimp/assemblies/falcon3/0-rawreads

however:

> ls -d job* | wc
    100     100     900

I thought perhaps not enough inodes, so I cleaned up and tried again. Same result...

To debug this, I want to log whenever fc_run.py creates a directory. Where is this?

In create_daligner_tasks, there are calls to each of:

makePypeLocalFile
make_daligner_task = PypeTask
make_daligner_task(task_run_daligner)

Which of these (if any) actually creates the job_ directory?

Thanks! David

dgordon562 commented 8 years ago

I've repeated 4x and gotten various #s of job* but always < 300.

However, the pypflow.log shows:

head -52997 pypeflow.log | grep "ready: task" | wc 5022 45198 497178

which is exactly the right number. So I need to dig further so the info above (which creates the job directories) is crucial.

pb-cdunn commented 8 years ago

A job directory is not created until the corresponding pypeflow-task is selected for distribution. I'm not sure why you have exactly 100, but I suspect that's your setting for pa_concurrent_jobs.

Are you saying that falcon has completed successfully, but only 100 PA jobs were run?

dgordon562 commented 8 years ago

I never let falcon continue when I see there are only 100 job_ directories.

I don't know what "selected for distribution" means... Are you saying that the job directory is not created until the rj* job is qsub'd? If so, then this is all a false alarm... But I don't remember that being the case....

dgordon562 commented 8 years ago

Just to be clear: my understanding was that if runjobs.sh said there were 5000 daligner jobs, then all 5000 job* directories were created before even the 1st daligner job was qsub'd. Is that the case?

pb-cdunn commented 8 years ago

No, I don't think so. The code which sets up the job is part of the pypeflow task. E.g. https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/mains/run.py#L267

def task_run_daligner(self):
    ...
    support.run_daligner(**args)
    run_script_and_wait_and_rm_exit(...)

https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/run_support.py#L436

def run_daligner(daligner_script, db_prefix, config, job_done, script_fn):
    if config['use_tmpdir']:
        # Really, we want to copy the symlinked db to tmpdir.
        # The output is fine in NFS.
        # Tricky. TODO.
        logger.warning('use_tmpdir currently ignored')
    bash.write_script_and_wrapper(daligner_script, script_fn, job_done)

The content of the script is created in code that runs before the pypeflow task is started, in https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/mains/run.py#L359

def create_daligner_tasks(run_jobs_fn, wd, db_prefix, rdb_build_done, config, pread_aln=False):
    ...
    for job_uid, script in bash.scripts_daligner(run_jobs_fn, db_prefix, rdb_build_done, pread_aln):

https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/bash.py#L166

def scripts_daligner(run_jobs_fn, db_prefix, rdb_build_done, pread_aln=False):
    ...
        bash = """
db_dir={db_dir}
ln -sf ${{db_dir}}/.{db_prefix}.bps .
ln -sf ${{db_dir}}/.{db_prefix}.idx .
ln -sf ${{db_dir}}/{db_prefix}.db .
ln -sf ${{db_dir}}/.{db_prefix}.dust.anno .
ln -sf ${{db_dir}}/.{db_prefix}.dust.data .
{daligner_cmd}
#rm -f *.C?.las
#rm -f *.N?.las
"""

dgordon562 commented 8 years ago

Thanks! Very helpful! More for the documentation...

PacificBiosciences / FALCON

only 100 job directories created for 5022 job assembly #325