PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 103 forks source link

congrats to Chris for fast restarts! #316

Open dgordon562 opened 8 years ago

dgordon562 commented 8 years ago

Hi, Chris,

I have restarted Falcon several times and what used to take 1 to 2 hours is now taking less than a minute. I'm using a much smaller genome, so I'm not sure...is this the result of your great work???

Best wishes, David

pb-cdunn commented 8 years ago

Thanks for the kind words, but we can do even better. Waiting in the wings is a change which will abstract the job distribution into a separate module that wraps all jobs in a "heartbeat" so we can tell whether they are still running. Eventually, we'll be able to use that to re-acquire still-running jobs if you stop and re-start.

dgordon562 commented 8 years ago

A few changes I've made: 1) I eliminate the .done.exit code in run.py so it just waits on done. That way, if a job crashes but other jobs are still running, I can (outside of Falcon) restart the crashed job (such as rp_00007.sh which I had to do last night). That alternative would be to restart fc_run.py which would mean all of the jobs that were running would get killed and restarted--not a desirable outcome.

2) I log the polling for *done file (once every 20 times) so, when no job is running but fc_run.py is still waiting, I can look in the log file and instantly see what it is waiting on.

3) I added a couple of items to log: any qsub command (so I can qsub outside of falcon without having to figure out the command). Also, whenever the polling for "done" finds the file, I log that. This enables me to grep for finding a particular file type so I easily know how many jobs have completed (and thus how many are yet to be done).

I hope your "even better" changes will still allow me to do each of these.

pb-cdunn commented 8 years ago

Actually, the first version of the code is already on the master branch. Just run fc_run1 instead of fc_run. I'd welcome feedback. E.g. we might need to add symlinks or other help to match jobs with pwatcher files. The main point is to abstract the job-distribution and tracking away from the task definition and dependency graph.