DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

How to suppress verbose stdout when launch large task count #231

Open sansarsecondary opened 3 years ago

sansarsecondary commented 3 years ago

Hi,

How do I disable verbose logging such as this:

...
Provider internal-id (operation): projects/xxx/locations/us-central1/operations/xxx
Provider internal-id (operation): projects/xxx/locations/us-central1/operations/xxx
Provider internal-id (operation): projects/xxx/locations/us-central1/operations/xxx
Provider internal-id (operation): projects/xx/locations/us-central1/operations/xxx
...

Thank you

mbookman commented 3 years ago

Hi @sansarsecondary !

This particular message doesn't have any kind of guard around it to suppress it:

https://github.com/DataBiosphere/dsub/blob/a01408d3769d93c3ae5c5f8ea1cdd0484dc15bd0/dsub/providers/google_v2_base.py#L922

  def _submit_pipeline(self, request):
    google_base_api = google_base.Api()
    operation = google_base_api.execute(self._pipelines_run_api(request))
    print('Provider internal-id (operation): {}'.format(operation['name']))

    return GoogleOperation(self._provider_name, operation).get_field('task-id')

This message was added to help make the underlying provider details (the Pipeline API) less opaque. It provides feedback when new operations are being created and provides some underlying detail should the associated operation need debugging.

Over time, we have looked to add more to the stderr output users see in order to better understanding how dsub is working.

If you don't need it, you can comment it out in your copy. You could also filter the stderr output as described here:

https://stackoverflow.com/questions/3618078/pipe-only-stderr-through-a-filter

If there's a feedback principle you'd be able to articulate, we could also add some command-line flags for what's "verbose" and what isn't.

Thanks!

sansarsecondary commented 3 years ago

HI @mbookman Thanks for your reply. Actually your idea of filtering will work for my use case. That being said, this is what my task output looks like:

Job properties:
  job-id: xxx
  job-name: xxx
  user-id: xx
...
Provider internal-id (operation): projects/xxx/locations/us-central1/operations/xxx
Provider internal-id (operation): projects/xxx/locations/us-central1/operations/xxx
Provider internal-id (operation): projects/xxx/locations/us-central1/operations/xxx
Provider internal-id (operation): projects/xx/locations/us-central1/operations/xxx
...
Launched job-id: xxx
400 task(s)
To check the status, run:
  dstat --provider google-cls-v2 --project xxx --location us-central1 --jobs 'xxx' --users 'xxx' --status '*'
To cancel the job, run:
  ddel --provider google-cls-v2 --project xxx --location us-central1 --jobs 'xxx' --users 'xxx'
Waiting for job to complete...
Monitoring for failed tasks to retry...
*** This dsub process must continue running to retry failed tasks.

Under most circumstances the provider internal id is not required. May I request that the Provider internal-id (operation) print be guarded under --verbose flag? I appreciate that dsub is trying to be transparent how the interaction with CLS api works.