how to best determine how many daligner jobs have completed and how many are left to be done

dgordon562 commented 7 years ago

Hi, Chris,

How about this idea:

Run falcon like this: fc_run.py cfg >&fc_run.out

grep for these lines:

[INFO]Num still unsatisfied: 23

To get the total number of daligner jobs, count daligner lines in 0-rawreads/run_jobs.sh.

The difference gives the # of daligner jobs completed.

Will this work even if falcon has been restarted several times and fc_run.out is just the most recent copy of stdout/stderr of fc_run.py?

Do you have a better idea (that doesn't involve reading lots of files)?

Thanks! David

pb-cdunn commented 7 years ago

fc_run.py cfg >&fc_run.out

http://www.tldp.org/LDP/abs/html/io-redirection.html

>& has a meaning in Bash. I assume you meant:

fc_run.py cfg > fc_run.out

grep ... 'Num still unsatisfied'

No, you cannot rely on that, since it relates to system vagaries.

Do you have a better idea?

We could print the number of successful and failed tasks when refresh completes. (Maybe we do already?)
We could print when each task succeeds (and maybe we do already), and you could grep for all those lines.
We could return these numbers from refresh(), and the Falcon running could print something more detailed.

dgordon562 commented 7 years ago

thanks, Chris.

Does Falcon still look for the done flags? If so, which .py file has the code that looks for the done flags? If not, how does falcon detect when a qsub'd job has completed?

Thanks! David

dgordon562 commented 7 years ago

While you're at it, could you point me to the file where the qsub is actually done:

the equivalent of the old _qsub_script in run.py

pb-cdunn commented 7 years ago

Does Falcon still look for the done flags?

There are several process-watcher backends now. They all work differently.

You might like pwatcher_type=blocking. In that case, submitted jobs are done when the blocking calls return. Very simple, but re-acquiring a running job would never be possible. (That feature has never been implemented anyway.) If you're interested, I can show you how to use it.
fs_based still works, but filesystems are inherently finicky. That relies on some done files (in the mypwatcher dir by default).
network_based is similar to fs_based, but it relies on network socket communication to learn what is done. It also sends logs over the socket.

The dependency graph may also use some "done" files, but those are completely separate from job-submissions. I plan (very soon), to make all tasks create a done file for the dependency graph. (I first had to switch to a newer, simpler workflow engine, and I had to change all Jason's scripts to use it, including in FALCON-unzip. That's 15-20 scripts, so it took some time.)

So things are getting much simpler, but progress takes time.

While you're at it, could you point me to the file where the qsub is actually done? the equivalent of the old _qsub_script in run.py

You should see the qsub lines in the log if your code is completely up-to-date. We do not currently dump a bash script which contains the qsub line. But we do usually dump bash scripts on the remote hosts (e.g. 0-rawreads/prepare_rdb.sh). The file task.json is the key. It is loaded by a python module/program called pypeflow/do_task.py. With pwatcher_type=blocking I think that is run by task.sh, and that is run by run.sh, and that is run by run-P....sh, which is run by qsub.

PacificBiosciences / FALCON

how to best determine how many daligner jobs have completed and how many are left to be done #490