common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 231 forks source link

[Windows] Discussing issues in Cwltool windows compatibility #416

Closed kapilkd13 closed 3 years ago

kapilkd13 commented 7 years ago

This issue is created to discuss problems that may occur in implementing windows compatibility for cwltool

kapilkd13 commented 7 years ago

On running cwltool test on windows. I am getting [Errno 13] Permission denied: 'C:\\users\\kapilk~1\\appdata\\local\\temp\\tmpwhycvl in here The reason is tempfile.NamedTemporaryFile() opens a temporary file which we latter access without closing it. This file access using file.name without closing file is allowed in unix but not on windows. Now if we try to close file before accessing it, file gets deleted as this is NamedTemporaryFile normal behaviour. Currently I am passing delete=False as a argument to this function, hence these temp files are not deleted when file is being closed. But then these temp files are not deleted and they remain in our tmp folder. Is there a better way to handle this.

kapilkd13 commented 7 years ago

Hi I made some changes to remove symlinking on a local branch https://github.com/common-workflow-language/cwltool/compare/master...kapilkd13:testing-windows?expand=1 for testing, but I am getting https://travis-ci.org/kapilkd13/cwltool/jobs/240462166 these errors. Can anyone help me with these failing test. I can't understand the reason behind these failures. Oddly all failing test cases use tempfile.mkdtemp to create temp directory whose docs says Creates a temporary directory in the most secure manner possible. There are no race conditions in the directory’s creation. The directory is readable, writable, and searchable only by the creating user ID. I think it is possible that this temp dir is not searchable/writable but then how did it worked earlier with symlinks and now it doesn't.

mr-c commented 7 years ago

Hey @kapilkd13 If I was debugging these errors I would use something like https://pypi.python.org/pypi/pdbpp/ to troubleshoot the tests

kapilkd13 commented 7 years ago

Thanks @mr-c Although some bugs are still remaining, I solved other bugs using pdb++.

kapilkd13 commented 7 years ago

This is related to the InlinejavascriptRequirement issue on windows. As the windows consider stdin,stdout as file descriptors the current select module do not allow us to do Non Blocking I/O. In order to overcome this issue I created a commit that used threads. here https://github.com/common-workflow-language/cwltool/pull/419/commits/69fc0fb9790d020bbd9f079b86a36781ae9f3e87. Onn initial testing on conformance test this solution looked fine. But thanks to Janneke, she pointed out a workflow on which this workflow failed. So Now the problem was: My current patch worked and did gave output.

WorkflowException: :1:1: Expression evaluation error:
:1:1: Long-running script killed after 20 seconds: returncode was: -9
:1:1: script was:
:1:1: 01 "use strict";
:1:1: 02 var inputs = {};
:1:1: 03 var self = null;
:1:1: 04 var runtime = {
:1:1: 05     "outdirSize": 1024, 
:1:1: 06     "ram": 1024, 
:1:1: 07     "tmpdirSize": 1024, 
:1:1: 08     "cores": 1, 
:1:1: 09     "tmpdir": "/tmp/tmpgcXGd6", 
:1:1: 10     "outdir": "/tmp/tmpGqSJ21"
:1:1: 11 };
:1:1: 12 (function(){return ((1+1));})()
:1:1: stdout was: 
:1:1: 2
:1:1: stderr was: 

Process finished with exit code 1

notice stdout value 2, it is correct so expression is evaluated correctly. The problem is that as I am using threads instead to do this I/O read. on joining them the thread gets blocked on os.read statement as it expects input. I tried to find a way to read data without blocking/ way to check if data is in stdout and only read if it exists. On windows I can't find a reliable way. isatty didn't help. There are now two possible solution that i know of. Sol1: promote treads to daemon. Dont join them and they will die once program executes completely. But it has a catch, As @tetron told me we try to use same jsprocess as to evaluate multiple js expression as process is expensive. So Now if we use daemon threads when evaluating 2nd expression, thread from first expresion are still alive and would capture 2nd process output also. Its solution: use new js process for each expression evaluation. Sol 2: use process instead of threads. Although there will be minor speed loss we will have better handling over it. So when we think we have received output, we can terminate process. This way we can utilise single js process. Personally I am inclined to second method. what do you think @mr-c @anton-khodak ,Janneke

Update Not a good news. I tried using multiprocess on windows but due to pickling I cant pass nodejs object and when I pass nodejs.stdout file descriptor to new process, it get closed. So I tried passing nodejs.stdout.fileno() and tried opening it later in new process using os.fdopen but this gave me Bad-descriptor Error. Note that if I pass sys.stdout.fileno(), it gets opened in new process. I can't seem to find solution to it. I made a SO question here. https://stackoverflow.com/questions/44866552/python-2-7-windowsunable-to-read-one-processnot-host-stdout-from-host-process complete description is here. Any suggestion is appreciated.

tetron commented 7 years ago

For the short term to get unblocked (hah) I suggest using Popen.communicate() to evaluate a single expression and running nodejs to completion. This will make expression evaluations expensive, however the common case of simple parameter references are already evaluated directly by cwltool.

I checked the Python standard library implementation of subprocess.communicate(). At least on Windows it uses threads to read from stdout and stderr, presumably because non-blocking reads/select are not available.

I found one stack overflow discussion about nonblocking reads. It's complicated:

https://stackoverflow.com/questions/34504970/non-blocking-read-on-os-pipe-on-windows

A longer term solution would be to provide an option that integrates a Javascript engine directly:

https://github.com/sony/v8eval

kapilkd13 commented 7 years ago

Thanks @tetron . For now I would go with the option to use new nodejs instance for each expression. Later I would work on the option to integrate Js engine directly. Thanks again! 😃

kapilkd13 commented 7 years ago

I have fixed InlineJs problem with one process per expression for now. I want to discuss on a new issue that I am facing. @jvdzwaan gave me a CommandLineTool

#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
baseCommand: frog
arguments:
  - prefix: --outputdir=
    separate: false
    valueFrom: $(runtime.outdir)
hints:
  - class: DockerRequirement
    dockerPull: proycon/lamachine
inputs:
  - id: dir_in
    type: Directory
    inputBinding:
      position: 1
      prefix: --testdir=
      separate: false
  - id: skip
    type: string?
    inputBinding:
      position: 2
      prefix: --skip=
      separate: false
outputs:
  - id: frogout
    type:
      type: array
      items: File
    outputBinding:
glob: "*.out"

Notice outputdir is taking value from runtime.outdir. And later this value is used in linux docker during execution. Now Since we are working on windows, path resolution is as per windows and so runtime.outdir contains path like C:\foo\bar but in docker we need something like /c/foo/bar. When passing outdir or other usual file path to docker I convert them appropriately as I know they are path but incase of outputdir its just a variable and I don't know if its a path or not and should I convert it or not.
One possibility is that I convert all paths(like runtime.outdir, input.path etc) to docker compatible, but since these path are used before and after execution a lot in staging, file copying, path mapping etc and for this, these paths must be converted back to windows compatible. I don't know if this is the best solution or not.
Another idea is to make a list like pathlist and put runtime.outdir and all other variables that contain path to it. On variable resolution if a variable is pointing to any of these the value would be converted appropriately. I haven't tried this. I want to know if this is possible. Last solution that comes to my mind is somehow user specify that variable he wants to be resolved is a path. Then on windows we can make changes appropriately. Maybe instead of

  - prefix: --outputdir=
    separate: false
valueFrom: $(runtime.outdir)

something like

  - prefix: --outputdir=
    separate: false
valueFrom: $(runtime.outdir)
type:path

But it doesn't sound a good idea to me also 😺

what do you think @mr-c @tetron @jvdzwaan @anton-khodak what direction should i take.

tetron commented 7 years ago

The values of $(runtime.outdir) and $(runtime.tmpdir) and $(input.somevar.path) must be in the environment of the tool, so when running inside a Docker container they should be /c/foo/bar in your example.

The job of the PathMapper class is to translate the host paths (which would be Windows path, or better file:/// URIs) to the correct paths inside the container.

kapilkd13 commented 7 years ago

@tetron Alright. I would try modifying these values itself instead of the values that are being passed. I thought maybe we are using runtime.outdir after completion to move output file to another folder. I will try and report the output. Thanks!

tetron commented 7 years ago

There's an internal outdir variable in cwltool, but the value of$(runtime.outdir) itself is specifically the value from the perspective of the tool (inside the container).

kapilkd13 commented 7 years ago

Since we have achieved windows compatibility. I think we can close this.

KerstenBreuer commented 5 years ago

I am still getting the same behavior with the latest cwltool and docker for windows on windows 10 pro 1809.

Even a simple sleep test (where nothing is actually written to the filesystem) cause this result:

sleep.cwl:

cwlVersion: v1.0
class: CommandLineTool
hints:
  DockerRequirement:
    dockerPull: ubuntu:latest
  ResourceRequirement:
    coresMin: 1
    ramMin: 5000
    tmpdirMin: 1000
baseCommand: ["sleep"]
inputs:
  sleep_time:
    type: int
    inputBinding:
      position: 1
outputs: []

Workflow:

cwlVersion: v1.0
class: Workflow
inputs:
  sleep_time:
    type: int
steps:
  sleep:
    run: "../tools/sleep.cwl"
    in:
      sleep_time:
        source: sleep_time
    out: []
outputs: []

I am getting following error:

PermissionError: [Errno 13] Permission denied: 'D:\\test\tmpledy674c'

Anyone know how to fix this?

KerstenBreuer commented 5 years ago

The above error does not occur for older releases of cwltool: 1.0.20180809224403

mr-c commented 5 years ago

@KerstenBreuer Thank you for your report.

Can you share the output of cwltool -debug [rest of the options]

KerstenBreuer commented 5 years ago

Sure thing, the complete output is:

C:\Users\kerst\AppData\Local\Programs\Python\Python37-32\Scripts\cwltool 1.0.20181217162649
Resolved '../CWL/workflows/wf_sleep.cwl' to 'file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl'
[workflow ] initialized from file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl
[workflow ] start
[workflow ] {
    "sleep_time": 10
}
[workflow ] starting step sleep
[job step sleep] job input {
    "file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl#sleep/sleep_time": 10
}
[job step sleep] evaluated job input to {
    "file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl#sleep/sleep_time": 10
}
[step sleep] start
[job sleep] initializing from file:///D:/cwl_test_lsf/CWL/tools/sleep.cwl as part of step sleep
[job sleep] {
    "sleep_time": 10
}
[job sleep] path mappings is {}
[job sleep] command line bindings is [
    {
        "position": [
            -1000000,
            0
        ],
        "datum": "sleep"
    },
    {
        "position": [
            1,
            "sleep_time"
        ],
        "datum": 10
    }
]
[job sleep] initial work dir {}
[job sleep] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
[job sleep] D:\cwl_test_lsf\test_outov4h5ajz$ docker \
    run \
    -i \
    --volume=/D/cwl_test_lsf/test_outov4h5ajz:/vvYZzU:rw \
    --volume=/D/cwl_test_lsf/test_outakop6qwd:/tmp:rw \
    --workdir=/vvYZzU \
    --read-only=true \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/vvYZzU \
    --cidfile=D:\cwl_test_lsf\test_out\tmpip6bvnmm\20190218184117-175991.cid \
    ubuntu:latest \
    sleep \
    10
Exception while running job
Traceback (most recent call last):
  File "c:\users\kerst\appdata\local\programs\python\python37-32\lib\site-packages\cwltool\job.py", line 308, in _execute
    monitor_function=monitor_function
  File "c:\users\kerst\appdata\local\programs\python\python37-32\lib\site-packages\cwltool\job.py", line 769, in _job_popen
    monitor_function(sproc)
  File "c:\users\kerst\appdata\local\programs\python\python37-32\lib\site-packages\cwltool\job.py", line 692, in docker_monitor
    with open(stats_file.name, mode="w") as stats_file_handle:
PermissionError: [Errno 13] Permission denied: 'D:\\cwl_test_lsf\\test_out\\tmp9zb55c9f'
[job sleep] completed permanentFail
[job sleep] {}
[step sleep] produced output {}
[step sleep] completed permanentFail
[workflow ] completed permanentFail
[workflow ] {}
[job sleep] Removing input staging directory D:\cwl_test_lsf\test_outxafmhgjn
[job sleep] Removing temporary directory D:\cwl_test_lsf\test_outakop6qwd
{}
Final process status is permanentFail

Thanks for taking care of this.

mr-c commented 5 years ago
    with open(stats_file.name, mode="w") as stats_file_handle:
PermissionError: [Errno 13] Permission denied: 'D:\\cwl_test_lsf\\test_out\\tmp9zb55c9f'

Does this file exist and what permissions does it have?

KerstenBreuer commented 5 years ago

No the file was not generated.

ZimmerA commented 5 years ago

Same Problem here, any solutions so far?

KerstenBreuer commented 5 years ago

I narrowed it down. The bug appears first in release 1.0.20190228155703. The last working release is 1.0.20181201184214.

mr-c commented 3 years ago

We have dropped MS Windows compatibility, see https://github.com/common-workflow-language/cwltool#ms-windows-users for how to run using WSL 2.