Open adamnovak opened 4 years ago
Related issues:
--debugWorker
in favor of toil debug-job
only?--debugWorker flag causes the job to restart infinitely #2739
Maybe we can drop --debugWorker in favor of toil debug-job only?
does toil debug-job
work with toil-cwl-runner
?
It should. toil-cwl-runner
proceces Toil jobs in the job store, and I don't think its leader does any special support work fore those jobs while they run. You'd still need to get the Toil job store ID of the job you want to debug, maybe from a failure message, instead of being able to just run all the jobs in-process fishing for failures, though.
Maybe we can drop --debugWorker in favor of toil debug-job only?
The --debugWorker flag allows debugging in pycharm and with pdb. It's something I often use, particularly with toil-cwl-runner on whole workflows. I'd argue that it's pretty important to keep.
If I don't use it, I can't set break points in the cwl library files. Adding CWL support for new versions would be very difficult without it.
Right now, to debug Toil jobs that don't work, you are limited to:
toil-vg
's dumping of files sent to failing child processes to its outstore).We would like this to be easier; diekhans wants to be able to easily reproduce and fix a segfauling command run inside a Toil job inside of a 5-day-long Cactus workflow.
There are several levels of goodness we could implement here:
toil debug-job
, which is able to download and locally run a flaky job given the job store and the job's ID. None of the toil devs actually know much about it. There should be a debugging or troubleshooting section in the docs, maybe under "Developing Workflows", that covers it. Maybe Toil's end-of-failing-run message could even suggest to use it.toil debug-job
appears to run the normal worker, meaning it's going to put its temporary files in the normal work directory and delete them when it is done. If we want to rerun a failing subprocess, we might want to make it (at least by default) put the work directory in/under the current directory, and leave it behind when the job fails for user inspection.toil-vg
's dumping of subprocess input files, and glue no-container and Singularity support onto Toil's docker-calling system (and/or hook the subprocess module?) so we know when external processes are called. Then when one fails, we could upload all its inputs to the file store, and save an incident report that describes what we tried to run, what the inputs were, and that it didn't work. Then we'd have a bit of machinery to dump the incident reports (maybe all together at once, as well as in the logs?), or to rehydrate one by setting up the input files again so that the user can debug just that external command on their local machine, instead of the whole Toil job.toil debug-job
, if it were able to work for jobs that need services (by starting the services), or if it were even just able to report out how to manually start necessary services, that could help with Cactus debugging, because some Cactus jobs we want to debug actually use the service system.┆Issue is synchronized with this Jira Epic ┆Epic: Improve debugging experience ┆Issue Number: TOIL-552