DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Add a verbose mode option flag to dsub #169

Closed indraniel closed 5 years ago

indraniel commented 5 years ago

I noticed that the providers have a verbose mode object variable, and that object variable is potentially set via the args :

$ git grep "getattr(args, 'verbose'"
dsub/providers/provider_base.py:        getattr(args, 'verbose', False),
dsub/providers/provider_base.py:        getattr(args, 'verbose', False), getattr(args, 'dry_run', False),

but there was no way to toggle that option via the CLI. This change adds a --verbose option to the command line.

mbookman commented 5 years ago

Hi @indraniel !

Can you describe what additional information you are actually looking for? The verbose field in the providers today provides minimal additional capability, which is part of why it has never been exposed as a user flag. It's always a difficult decision as to what is appropriate regular logging versus verbose or debug. We haven't come up with a coherent framework for that in dsub.

Thanks!

indraniel commented 5 years ago

Hi @mbookman ! Many Thanks for responding!

My ulterior motive is to display the actual google genomics operations name upon submitting a google genomics pipelines job.

In other words, enable the verbose if clause code inside of dsub.providers.google_v2:

  def _submit_pipeline(self, request):
    google_base_api = google_base.Api(verbose=True)
    operation = google_base_api.execute(
        self._service.pipelines().run(body=request))
    if self._verbose:
      print('Launched operation {}'.format(operation['name']))

There are times when searching for the google genomics operations via the dsub name yields in HTTP 503 errors in our particular google cloud project; however, when directly observing the operation via:

gcloud alpha genomics operations describe <name>

we can see the internal details about the job without repeatly calling the dstat command. It is useful for our debugging.

If there's a better suggested way to log the google genomics operations name via dsub, I'll take that approach.

mbookman commented 5 years ago

Thanks for that detail.

We have observed that from time to time we need to point people to running:

dstat ... --full | grep internal-id

and then grabbing the details from

gcloud alpha genomics operations describe [id]

and are interested in whether there is a specific category of issues that we could solve with improving the dstat ... full output or details from the dsub ... wait.

If it is simply helpful to get the Pipelines API operation id emitted upon launch, I don't see any reason not to do that. Would also be interested to hear whether there are specific cases you can point to where better output from dsub ... wait or dstat ... full would get you more quickly to resolving issues with tasks.

indraniel commented 5 years ago

Thanks for the response!

If it is simply helpful to get the Pipelines API operation id emitted upon launch, I don't see any reason not to do that.

I think doing the above would be very helpful to make easier the correlation between the Google Genomics API operation ID and the dsub job name. Especially for first time users.

Would it be okay to simply remove the if self._verbose option to _submit_pipeline and make it like so:

  def _submit_pipeline(self, request):
    google_base_api = google_base.Api(verbose=True)
    operation = google_base_api.execute(
        self._service.pipelines().run(body=request))
    print('Launched operation {}'.format(operation['name']))

?

jean-philippe-martin commented 5 years ago

Yes, but we have to be careful about this suggestion:

If it is simply helpful to get the Pipelines API operation id emitted upon launch, I don't see any reason not to do that.

Because currently on launch we emit the job ID and code relies on it, with for example in the documentation for job control:

JOBID_A=$(dsub ...)
JOBID_B=$(dsub ...)
dsub ... --after "${JOB_A}" "${JOB_B}"
indraniel commented 5 years ago

Ok to print it out to standard error then?

from __future__ import print_function
import sys

def _submit_pipeline(self, request):
  google_base_api = google_base.Api(verbose=True)
  operation = google_base_api.execute(
      self._service.pipelines().run(body=request))
  print('Launched operation {}'.format(operation['name']), file=sys.stderr)
jean-philippe-martin commented 5 years ago

Yes @indraniel, absolutely.

mbookman commented 5 years ago

We will add printing this to stderr in the next release we push out. Thanks!

indraniel commented 5 years ago

Thanks @mbookman and @jean-philippe-martin !

Should I formally make the pull request about this (either make a new pull request, or update this one), or should I simply close out this request and do nothing?

mbookman commented 5 years ago

Hi @indraniel .

We will close this when we push out the next release, which includes emitting the operation id.

Thanks!

indraniel commented 5 years ago

Hi @mbookman !

Thanks for setting this up! Do you know when you all will push out the next release?

wnojopra commented 5 years ago

Hi @indraniel ,

0.3.4 was released yesterday which includes emitting the operation id.

Thanks, Willy