DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
900 stars 240 forks source link

Beginning of worker logs missing from logfile #2159

Closed ionox0 closed 6 years ago

ionox0 commented 6 years ago

Hello Toil team,

This is a follow up from https://github.com/common-workflow-language/cwltool/issues/703

I'm still seeing this issue after upgrading cwltool to 1.0.20180326152342

The issue is that the logs from this abra tool do not always get logged into the cwltoil.log file. Specifically I'm missing the beginning of the logs.

I linked to the above^ issue to point out the fact that the 0.92 version of the abra tool which we are currently using logs to standard error not stdout, which may be relevant.

This is the section of the logs that is missing (which I generated by using echo as the baseCommand instead of the abra jar): https://gist.github.com/ionox0/d79acf464f7a2497ccfd270aecd8c5eb

Versions:

cwltool version: 1.0.20180326152342 Toil version: 3.14.0

Script used to run toil-cwl-runner

#!/bin/bash

toil-cwl-runner \
    --outdir ${output_directory} \
    --writeLogs ${output_directory}/log \
    --logFile ${output_directory}/log/cwltoil.log \
    --batchSystem ${batch_system} \
    --preserve-environment PATH PYTHONPATH \
    --defaultDisk 10G \
    --defaultMem 10G \
    --no-container \
    --linkImports \
    --disableCaching \
    --realTimeLogging \
    --workDir ${output_directory}/tmp \
    --jobStore file://${jobstore_path} \
    --cleanWorkDir onSuccess \
    --logDebug \
    --stats \
    ${workflow} \
    ${inputs_file}

inputs.yaml

threads: 4
input_bams: 
- {class: File, path: '../test-data/standard_markDuplicates/Sample_ZS-msi-4506-pl-T01_IGO_05500_EF_41_standard_RG_MD.bam'}
- {class: File, path: '../test-data/standard_markDuplicates/Sample_ZS-msi-4506-pl-T01_IGO_05500_EF_42_standard_RG_MD.bam'}
input_bais: 
- {class: File, path: '../test-data/standard_markDuplicates/Sample_ZS-msi-4506-pl-T01_IGO_05500_EF_41_standard_RG_MD.bai'}
- {class: File, path: '../test-data/standard_markDuplicates/Sample_ZS-msi-4506-pl-T01_IGO_05500_EF_42_standard_RG_MD.bai'}

reference_fasta: './data/hg-fasta/VERSIONS/hg19/Homo_sapiens_assembly19.fasta'
targets: {class: File, path: '../test-data/start/pan-cancer-panel.sorted.bed'}

kmer: 43,53,63,83,93
mad: 1000
working_directory: ./data/abra_ian_temp_test_03-27

out: 
- Sample_ZS-msi-4506-pl-T01_IGO_05500_EF_41_standard_RG_MD.bam
- Sample_ZS-msi-4506-pl-T01_IGO_05500_EF_42_standard_RG_MD.bam

abra.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0

class: CommandLineTool

baseCommand:
- /bin/java

arguments:
- -Xmx20g
- -Djava.io.tmpdir=/scratch
- -jar
- ./data/abra-0.92-SNAPSHOT-jar-with-dependencies.jar

requirements:
  InlineJavascriptRequirement: {}
  ResourceRequirement:
    ramMin: 62000
    coresMin: 8

doc: |
  None

inputs:

  input_bams:
    type: File[]
    inputBinding:
      prefix: --in
      itemSeparator: ','
    secondaryFiles:
    - ^.bai

  working_directory:
    type: string
    inputBinding:
      prefix: --working

  reference_fasta:
    type: string
    inputBinding:
      prefix: --ref

  targets:
    type: File
    inputBinding:
      prefix: --targets

  threads:
    type: int
    inputBinding:
      prefix: --threads

  kmer:
    type: string
    inputBinding:
      prefix: --kmer

  mad:
    type: int
    inputBinding:
      prefix: --mad

  out:
    type: string[]
    inputBinding:
      itemSeparator: ','
      prefix: --out

outputs:

  bams:
    type: File[]
    outputBinding:
      glob: '*_IR.bam'
mr-c commented 6 years ago

@ionox0 If stderr output is important then I'd make that a separate output file

outputs:
  log: stderr
  bams:
    type: File[]
    outputBinding:
      glob: '*_IR.bam'
mr-c commented 6 years ago

Additionally you have many hard coded paths, which is fine to get started, but will need to be removed to make this CWL description portable :-)

mr-c commented 6 years ago

Toil v3.15.0 was just released; I wonder if that helps?

ionox0 commented 6 years ago

@mr-c Yes you're right, that method does give me a new log file with the error logs from the tool, however this file also skips over the initial logging statements that I was expecting, and has the same logs as are logged in the console. I'll try upgrading to the new version of Toil, and get back with any changes, thanks again for the help!

log output file:

luna.cbio.private 2018-03-28 11:24:42,593 Thread-323 DEBUG toil.statsAndLogging: Received Toil worker log. Disable debug level logging to hide this output
2018-03-28 11:24:42,593 - toil.statsAndLogging - DEBUG - Received Toil worker log. Disable debug level logging to hide this output
luna.cbio.private 2018-03-28 11:24:42,593 Thread-323 DEBUG toil.statsAndLogging: 'Innovation-Pipeline/cwl_tools/abra/abra__debub.cwl' /opt/common/CentOS_6/java/jdk1.8.0_25/bin/java B/H/jobZGC5L8    ativeAssembler    Region: 14_105240223_105240343  Length: 120 ReadCount:  0   Elapsed 5   Assembled   false   43
20
...

logged to console:

luna.cbio.private 2018-03-28 11:24:42,593 Thread-323 DEBUG toil.statsAndLogging: Received Toil worker log. Disable debug level logging to hide this output
2018-03-28 11:24:42,593 - toil.statsAndLogging - DEBUG - Received Toil worker log. Disable debug level logging to hide this output
luna.cbio.private 2018-03-28 11:24:42,593 Thread-323 DEBUG toil.statsAndLogging: 'Innovation-Pipeline/cwl_tools/abra/abra__debub.cwl' /opt/common/CentOS_6/java/jdk1.8.0_25/bin/java B/H/jobZGC5L8    ativeAssembler    Region: 14_105240223_105240343  Length: 120 ReadCount:  0   Elapsed 5   Assembled   false   43
2018-03-28 11:24:42,593 - toil.statsAndLogging - DEBUG - 'Innovation-Pipeline/cwl_tools/abra/abra__debub.cwl' /bin/java B/H/jobZGC5L8    ativeAssembler Region: 14_105240223_105240343  Length: 120 ReadCount:  0   Elapsed 5   Assembled   false   43

^ Note how it begins right in the middle of the word "ativeAssembler Region:", which is somehow getting cut off and should say "Elapsed_msecs_in_NativeAssembler Region:"

Here are the lines that I would expect to see, as seen with GATK's FindCoveredIntervals tool:

Start of worker log

Received Toil worker log. Disable debug level logging to hide this output
Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO:toil.worker:---TOIL WORKER OUTPUT LOG---
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    DEBUG:toil.worker:Next available file descriptor: 5
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO:toil:Running Toil version 3.14.0-b91dbf9bf6116879952f0a70f9a2fbbcae7e51b6.
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' 

Debug configuration settings

/bin/java Z/J/job271AFD    DEBUG:toil:Configuration: {'rescueJobsFrequency': 3600, 'logLevel': 'DEBUG', 'minNodes': None, 'maxMemory': 9223372036854775807, 'jobStore': 'file:/ifs/work/bergerm1/Innovation/sandbox/ian/DY_03-26_55/tmp/jobstore-ff864d84-3137-11e8-a6bf-645106ef9e4c', 'disableHotDeployment': False, 'linkImports': True, 'nodeOptions': None, 'nodeTypes': [], 'servicePollingInterval': 60, 'workDir': 'outputFolder/tmp', 'stats': True, 'disableCaching': True, 'maxPreemptableServiceJobs': 9223372036854775807, 'environment': {}, 'parasolMaxBatches': 10000, 'cleanWorkDir': 'onSuccess', 'maxCores': 9223372036854775807, 'sseKey': None, 'maxDisk': 9223372036854775807, 'scaleInterval': 30, 'deadlockWait': 60, 'defaultPreemptable': False, 'clusterStats': None, 'defaultCores': 1, 'cseKey': None, 'betaInertia': 1.2, 'metrics': False, 'maxNodes': [10], 'scale': 1, 'writeLogs': 'outputFolder/log', 'badWorker': 0.0, 'defaultDisk': 10737418240, 'mesosMasterAddress': '140.163.153.177:5050', 'restart': False, 'useAsync': True, 'preemptableCompensation': 0.0, 'parasolCommand': 'parasol', 'workflowID': 'c793845f-0454-4e8c-9478-a1e174148e3e', 'alphaPacking': 0.8, 'maxServiceJobs': 9223372036854775807, 'readGlobalFileMutableByDefault': False, 'badWorkerFailInterval': 0.01, 'statePollingWait': 1, 'debugWorker': False, 'maxLogFileSize': 64000, 'defaultMemory': 10737418240, 'workflowAttemptNumber': 0, 'maxJobDuration': 9223372036854775807, 'clean': 'never', 'provisioner': None, 'batchSystem': 'lsf', 'retryCount': 1, 'writeLogsGzip': None, 'nodeStorage': 50}
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    DEBUG:toil.worker:Parsed job wrapper
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    DEBUG:toil.worker:Got a command to run: _toil w/W/jobdOspz7/g/tmpTgo3Hs.tmp virtualenv/lib/python2.7/site-packages toil.cwl.cwltoil True
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    DEBUG:toil.job:Loading user module ModuleDescriptor(dirPath='virtualenv/lib/python2.7/site-packages', name='toil.cwl.cwltoil', fromVirtualEnv=True).
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    DEBUG:rdflib:RDFLib Version: 4.2.2
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' 

Arguments & full command passed to the cwl

/bin/java Z/J/job271AFD    [job FindCoveredIntervals.cwl] outputFolder/out_tmpdirRg6Xzd$ /bin/sh \
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD        -c \
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD        '/bin/java' '-Xmx20g' '-Djava.io.tmpdir=/scratch' '-jar' 'GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar' '-T' 'FindCoveredIntervals' '--input_file' 'outputFolder/DY_03-26_55/tmpTGy4T2/stg5230694e-4c6a-4fc6-8943-b48408b6bbd3/MSK-L-009-bc_IGO_05500_DY_6_S3_aln_RG_MD.bam' '--reference_sequence' '/data/pubdata/hg-fasta/VERSIONS/hg19/Homo_sapiens_assembly19.fasta' '--input_file' 'outputFolder/tmpTGy4T2/stgbb0fa913-67ab-4736-83a5-a33f224912ca/MSK-L-009-cf_IGO_05500_DY_23_S22_aln_RG_MD.bam' '--coverage_threshold' '20' '--minBaseQuality' '20' '--minMappingQuality' '20' '--out' 'MSK-L-009.fci.list' --read_filter FailsVendorQualityCheck -rf BadMate -rf UnmappedRead -rf BadCigar
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' bin/java Z/J/job271AFD    INFO:cwltool:[job FindCoveredIntervals.cwl] outputFolder/out_tmpdirRg6Xzd$ /bin/sh \
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD        -c \
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD        '/bin/java' '-Xmx20g' '-Djava.io.tmpdir=/scratch' '-jar' '/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar' '-T' 'FindCoveredIntervals' '--input_file' 'outputFolder/tmpTGy4T2/stg5230694e-4c6a-4fc6-8943-b48408b6bbd3/MSK-L-009-bc_IGO_05500_DY_6_S3_aln_RG_MD.bam' '--reference_sequence' 'data/hg-fasta/VERSIONS/hg19/Homo_sapiens_assembly19.fasta' '--input_file' 'data/tmpTGy4T2/stgbb0fa913-67ab-4736-83a5-a33f224912ca/MSK-L-009-cf_IGO_05500_DY_23_S22_aln_RG_MD.bam' '--coverage_threshold' '20' '--minBaseQuality' '20' '--minMappingQuality' '20' '--out' 'MSK-L-009.fci.list' --read_filter FailsVendorQualityCheck -rf BadMate -rf UnmappedRead -rf BadCigar
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' 

Output from the tool itself

/opt/common/CentOS_6/java/jdk1.8.0_25/bin/java Z/J/job271AFD    INFO  03:52:33,052 HelpFormatter - --------------------------------------------------------------------------------
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,059 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,060 HelpFormatter - Copyright (c) 2010 The Broad Institute
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,064 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,070 HelpFormatter - Program Args: -T FindCoveredIntervals --input_file outputFolder/tmpTGy4T2/stg5230694e-4c6a-4fc6-8943-b48408b6bbd3/MSK-L-009-bc_IGO_05500_DY_6_S3_aln_RG_MD.bam --reference_sequence data/hg-fasta/VERSIONS/hg19/Homo_sapiens_assembly19.fasta --input_file outputFolder/tmpTGy4T2/stgbb0fa913-67ab-4736-83a5-a33f224912ca/MSK-L-009-cf_IGO_05500_DY_23_S22_aln_RG_MD.bam --coverage_threshold 20 --minBaseQuality 20 --minMappingQuality 20 --out MSK-L-009.fci.list --read_filter FailsVendorQualityCheck -rf BadMate -rf UnmappedRead -rf BadCigar
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,076 HelpFormatter - Executing as johnsoni@s20.cbio.private on Linux 2.6.32-696.3.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17.
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,076 HelpFormatter - Date/Time: 2018/03/27 03:52:33
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,077 HelpFormatter - --------------------------------------------------------------------------------
'Innovation-Pipeline/cwl_tools/gatk/FindCoveredIntervals.cwl' /bin/java Z/J/job271AFD    INFO  03:52:33,078 HelpFormatter - --------------------------------------------------------------------------------
...
ionox0 commented 6 years ago

I have not been able to reproduce this issue, but it has not occurred since upgrading to Toil 3.15.0

It may still be caused by a threading issue, but can close this until I see it again

ionox0 commented 6 years ago

I realize now that this was being caused by the --maxLogFileSize default of 64000 (logs were being truncate to last 128,000) bytes, for some reason 2X the size of this param. Thank you