broadinstitute / wdl-runner

Easily run WDL workflows on GCP
BSD 3-Clause "New" or "Revised" License
13 stars 11 forks source link

monitoring image/script throws java.lang.NullPointerException #11

Open skatragadda-nygc opened 4 years ago

skatragadda-nygc commented 4 years ago

Monitoring script/image throws

cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.interpretOperationStatus(GetR equestHandler.scala:122) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.interpretOperationStatus$(Get RequestHandler.scala:48) at cromwell.backend.google.pipelines.v2beta.api.request.RequestHandler.interpretOperationStatus(Request Handler.scala:23) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.$anonfun$handleRequest$2(GetR equestHandler.scala:33) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) ... 14 more Caused by: java.lang.NullPointerException at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler$$anonfun$$nestedInanonfun$get EventList$3$1.applyOrElse(GetRequestHandler.scala:139) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler$$anonfun$$nestedInanonfun$get EventList$3$1.applyOrElse(GetRequestHandler.scala:139) at scala.collection.immutable.List.collect(List.scala:308) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.$anonfun$getEventList$3(GetRe questHandler.scala:139) at scala.collection.immutable.List.flatMap(List.scala:338) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.getEventList(GetRequestHandle r.scala:138) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.interpretOperationStatus(GetR equestHandler.scala:72) ... 20 more

We are using latest cromwell and PAPI v2 config in wdl_runner image.

Monitoring options are set in options.json file using monitoring_script or monitoring_image. Please let me know if there is a way to fix this.

cc: @cjllanwarne @kv076 @dinvlad

skatragadda-nygc commented 4 years ago

I have used the following monitoring tools. https://github.com/broadinstitute/cromwell-monitor https://github.com/broadinstitute/cromwell-task-monitor-bq

also the script from here https://gatkforums.broadinstitute.org/wdl/discussion/12459/gathering-job-resource-usage-stats-from-backend

All of them work partially. But they keep throwing above error message and the workflow fails after a while.

kv076 commented 4 years ago

Sri, I would try using PAP V2 ALPHA rather than V2 Beta. We're still trying to get the V2 Beta support more mainstream.

skatragadda-nygc commented 4 years ago

Thanks for the quick response. Localization has improved drastically after moving to Beta reducing the overall workflow execution time. I am not sure if we want to go back. Is there a timeline for a fix or new release?

skatragadda-nygc commented 4 years ago

Monitoring did work with Alpha in our testing. Error seems to be caused from the code in this file https://github.com/broadinstitute/cromwell/blob/21d860fcd1fcb5b811b431db608b6d518dd9f6c3/supportedBackends/google/pipelines/v2beta/src/main/scala/cromwell/backend/google/pipelines/v2beta/api/request/GetRequestHandler.scala