Open kshakir opened 6 years ago
We experience this issue in a production environment, too. It's a big problem because in our environment cromwell is part of an automated system that collects new data, runs analysis workflows, and accessions the results to a public archive. Part of the provenance metadata that goes along with workflow runs is the docker image id that was used during the run. Having a value for that key be missing sometimes breaks the code that passes that important provenance information on to the next level of metadata.
Does your Cromwell routinely restart in the manner described in the ticket description? If you're using it in production, that seems less likely.
Similar to #3998 (
backendStatus
), but for the metadata keydockerImageUsed
.This call metadata key is written during job success by the engine. This key may be missing due to restarts of cromwell during centaur tests. Automated restarts of the centaur test end up call caching, where this key isn't written.
As a call cache hit technically doesn't have a dockerImage, it should be decided like in #3998 if the key
dockerImageUsed
should be written for cache hits.https://github.com/broadinstitute/cromwell/blob/9bee537c5f6a9ff4e8597f75b6844c0eaee721cc/engine/src/main/scala/cromwell/engine/workflow/lifecycle/execution/job/EngineJobExecutionActor.scala#L279-L281
Example log of a failure during WIP of #3658 dockerImageUsed_missing.txt