Open skatragadda-nygc opened 4 years ago
I have used the following monitoring tools. https://github.com/broadinstitute/cromwell-monitor https://github.com/broadinstitute/cromwell-task-monitor-bq
also the script from here https://gatkforums.broadinstitute.org/wdl/discussion/12459/gathering-job-resource-usage-stats-from-backend
All of them work partially. But they keep throwing above error message and the workflow fails after a while.
Sri, I would try using PAP V2 ALPHA rather than V2 Beta. We're still trying to get the V2 Beta support more mainstream.
Thanks for the quick response. Localization has improved drastically after moving to Beta reducing the overall workflow execution time. I am not sure if we want to go back. Is there a timeline for a fix or new release?
Monitoring did work with Alpha in our testing. Error seems to be caused from the code in this file https://github.com/broadinstitute/cromwell/blob/21d860fcd1fcb5b811b431db608b6d518dd9f6c3/supportedBackends/google/pipelines/v2beta/src/main/scala/cromwell/backend/google/pipelines/v2beta/api/request/GetRequestHandler.scala
Monitoring script/image throws
cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.interpretOperationStatus(GetR equestHandler.scala:122) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.interpretOperationStatus$(Get RequestHandler.scala:48) at cromwell.backend.google.pipelines.v2beta.api.request.RequestHandler.interpretOperationStatus(Request Handler.scala:23) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.$anonfun$handleRequest$2(GetR equestHandler.scala:33) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) ... 14 more Caused by: java.lang.NullPointerException at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler$$anonfun$$nestedInanonfun$get EventList$3$1.applyOrElse(GetRequestHandler.scala:139) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler$$anonfun$$nestedInanonfun$get EventList$3$1.applyOrElse(GetRequestHandler.scala:139) at scala.collection.immutable.List.collect(List.scala:308) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.$anonfun$getEventList$3(GetRe questHandler.scala:139) at scala.collection.immutable.List.flatMap(List.scala:338) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.getEventList(GetRequestHandle r.scala:138) at cromwell.backend.google.pipelines.v2beta.api.request.GetRequestHandler.interpretOperationStatus(GetR equestHandler.scala:72) ... 20 more
We are using latest cromwell and PAPI v2 config in wdl_runner image.
Monitoring options are set in options.json file using monitoring_script or monitoring_image. Please let me know if there is a way to fix this.
cc: @cjllanwarne @kv076 @dinvlad