common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 230 forks source link

stdout and stderr don't get used in the main function #597

Open ThomasHickman opened 6 years ago

ThomasHickman commented 6 years ago

If stdout or stderr get passed as values to cwltool.main.main, the expected behaviour doesn't happen e.g. in the following, in python 3:

import cwltool.main
import io
stream = io.StringIO()
cwltool.main.main(["test.cwl", "test.yml"], stdout=stream, stderr=stream)

It is expected for nothing to be outputted to the console, however, the following is outputted:

INFO:cwltool: 1.0.20171107133715
INFO:cwltool:Resolved '...' to '...'
DEBUG:cwltool:[job test.cwl_5] initializing from ...
DEBUG:cwltool:[job test.cwl_5] {
    "parameter": "test"
}
DEBUG:cwltool:[job test.cwl_5] path mappings is {}
DEBUG:cwltool:[job test.cwl_5] command line bindings is [
    {
        "position": [
            -1000000,
            0
        ],
        "datum": "echo"
    },
    {
        "position": [
            1,
            "parameter"
        ],
        "datum": "test"
    }
]
INFO:cwltool:[job test.cwl_5] /private/var/folders/5p/p6z8hhy174q_ctt3pn7mx5_c000h1t/T/tmpm2vomy4j$ echo \
    test
test
INFO:cwltool:[job test.cwl_5] completed success
DEBUG:cwltool:[job test.cwl_5] {}
DEBUG:cwltool:[job test.cwl_5] Removing input staging directory /private/var/folders/5p/p6z8hhy174q_ctt3pn7mx5_c000h1t/T/tmpn9kjnhce
DEBUG:cwltool:[job test.cwl_5] Removing temporary directory /private/var/folders/5p/p6z8hhy174q_ctt3pn7mx5_c000h1t/T/tmp4xzktq97
DEBUG:cwltool:Removing intermediate output directory /private/var/folders/5p/p6z8hhy174q_ctt3pn7mx5_c000h1t/T/tmpm2vomy4j
INFO:cwltool:Final process status is success
0

for the workflow:

cwlVersion: v1.0
inputs:
  - id: parameter
    type: string
    inputBinding:
      position: 1
outputs: []
class: CommandLineTool
baseCommand: echo

and inputs:

parameter: test

From what I've briefly looked at, the handler for logging.getLogger("cwltool") and https://github.com/common-workflow-language/cwltool/blob/f691bf1aa66f2836694716c9845731962147079b/cwltool/job.py#L500 needs to be writing to the parameterised stdout, though there may be other occasions where stdout or stderr is always written to.

Your Environment

tetron commented 6 years ago

Try setting logger_handler in main() instead.

ThomasHickman commented 6 years ago

I think I ran those commands in the python REPL, so the root logger was outputting those logs to the console. However I've run this as a python 3 script:

import cwltool.main
import io
import logging

stream_stdout = io.StringIO()
stream_stderr = io.StringIO()
stream_logging_handler = io.StringIO()

cwltool.main.main(["test.cwl", "test.yml"], stdout=stream_stdout, stderr=stream_stderr, logger_handler=logging.StreamHandler(stream_logging_handler))

print("----")

print(stream_stdout.getvalue())
print(stream_stderr.getvalue())
print(stream_logging_handler.getvalue())

stream_stdout.close()
stream_stderr.close()
stream_logging_handler.close()

and this is outputted onto the the console:

test
----
{}

test.py 1.0.20171227212058
Resolved 'test.cwl' to 'file:///Users/th10/checkouts/gatk_cwl_generator/my/test.cwl'
[job test.cwl] /private/var/folders/5p/p6z8hhy174q_ctt3pn7mx5_c000h1t/T/tmpsguk5edl$ echo \
    test
[job test.cwl] completed success
Final process status is success

indicating that test (the output of the program) is still not being captured. This still happens when I remove the print statements at the end

tetron commented 6 years ago

Ah, I see. Yes that is arguably a bug.

A simple fix would be to propagate the stderr down to Popen redirection.

A more complete solution would be to provide a hook that can be called for each job and return a stream that is attached to Popen, so you can log each job's output separately.