broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
972 stars 354 forks source link

Logging does not match cromwell's stdout #4310

Open stewartadam opened 5 years ago

stewartadam commented 5 years ago

Using cromwell v36, not all log output appears to be captured even if log level set to DEBUG.

For example, using the HaplotypeCallerGvcf_GATK4 reference on GitHub, if an non-existent path is provided for input_bam in the inputs file, this line will cause its workflow to fail:

Int disk_size = ceil(((size(input_bam, "GB") + 30) / hc_scatter) + ref_size) + 20

The resulting 'file not found' error is printed (50x, one per shard) on cromwell's stdout, but not captured anywhere in the logs written to disk. The workflow execution directory in cromwell-executions/HaplotypeCallerGvcf_GATK4/{guid} is just an empty folder and its workflow log (if preserved) indicates the workflow was started but no errors:

...
2018-10-24 13:29:31,409 INFO  - WorkflowExecutionActor-28a80dbf-a2ae-4fc8-96e6-0279c32a0e13 [UUID(28a80dbf)]: Starting HaplotypeCallerGvcf_GATK4.HaplotypeCaller (50 shards)
ruchim commented 5 years ago

AC: Confirm that this is reproducible behavior in the latest version of Cromwell, and given thats the case, confirm that job/workflow failure messages make it over to the workflow logs (not just the server logs)

stewartadam commented 5 years ago

I ran into this again today, unless the server stdout is captured the errors don't make their way to workflow logs so it's particularly annoying to debug failed workflows.

As a user, I would expect that either all logs (both pertaining to the workflow, or the server's execution of the workflow) are placed in the workflow log, or there is a separately configurable server log file for that stream of information.

caross73 commented 4 years ago

I'd like to bump this, we are running into this issue with cromwell-41 (and I am about to check cromwell-46) that when we have a workflow failure, the failure message appears in the server logs but is never copied to the workflow log.

Eg., Workflow Log (empty):

cat workflow.5a34cc05-9f9a-40a0-8691-2b0eb49cdbc3.log

Server Log:

grep -A3 5a34cc05-9f9a-40a0-8691-2b0eb49cdbc3 cromwell-2019-09-17.7566.log 2019-09-25 15:59:21,689 cromwell-system-akka.dispatchers.engine-dispatcher-26816 ERROR - WorkflowManagerActor Workflow 5a34cc05-9f9a-40a0-8691-2b0eb49cdbc3 failed (during MaterializingWorkflowDescriptorState): cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed: Invalid value for File input 'GermlineMasterWF.trimseq.TRIMSEQ_paired.Adapters': empty value Invalid value for File input 'GermlineMasterWF.trimseq.TRIMSEQ_single.Adapters': empty value at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle$materialization$MaterializeWorkflowDescriptorActor$$workflowInitializationFailed(MaterializeWorkflowDescriptorActor.scala:215)