DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Option "--keep-alive" not available in v0.3.8 #199

Open pbilling opened 4 years ago

pbilling commented 4 years ago

It looks like the --keep-alive was removed in v0.3.8. I find this useful for debugging; is there another way to enable this option, or will it be returning in future versions?

Thanks, Paul

wnojopra commented 4 years ago

Hi @pbilling,

Yes, --keep-alive was available prior to v0.3.8 but it only worked on the deprecated google provider (v1 api). This feature was not implemented in the v2 versions of the pipelines api.

What failures are you getting that requires you to debug on the VM? Typically looking at the logging that is available is enough to figure out issues and debug. It "should" be that if your logging is working, you shouldn't need to keep your VM up and SSH to it.

pbilling commented 4 years ago

Hi @wnojopra,

Got it, thanks for the clarification.

I was running into an application failure due to a wrongly formatted input. The error was clear from the logs, but I wanted to login to the VM and try reformatting & retrying the command interactively , since I figured it would take some trial & error. I ended up just firing up another VM, relocalizing all the inputs, and debugging it there.

mbookman commented 4 years ago

Thanks @pbilling for the detail on your use case. Doing a quick re-test on a VM (versus not have sufficient logging) makes a lot of sense.

A couple of additional options worth noting here:

If you are submitting a shell script, then you should be able to set an exit trap in that script like:

function exit_handler() {
  local rc="$?"

  echo "The exit code is ${rc}"
  if [[ "${rc}" -ne 0 ]]; then
    echo "Sleep for an hour..."
    sleep $((60 * 60))
  fi

  exit "${rc}"
}
readonly -f exit_handler

trap 'exit_handler' EXIT

That would give you the functionality I think that you were wanting here.

Also - have you tried using the local provider? It is intended for being able to iterate more rapidly.

-Matt

pbilling commented 4 years ago

Great, thanks for the suggestions @mbookman. I'll definitely look into these options for next time.