DockerOutputDirectory not supported in ADES/Calrissian

jzvolensky commented 3 months ago

Hello,

We are working on a machine learning model wrapped in an Application Package. I ran into a small compatibility/support issue with dockerRequirements.

Currently only DockerPull is supported by ades. During our local testing phase we have been using cwltool as the main runner. It supports dockerOutputDirectory. Below is a sample commandlinetool we use to update the model configuration file before the model is ran:

- class: CommandLineTool
  id: update-config
  label: cmdtool to update the HydroMT config
  doc: Update the HydroMT config file
  baseCommand: update
  inputs:
    - id: res
      type: string
      inputBinding:
        position: 1
    - id: precip_fn
      type: string
      inputBinding:
        position: 2
  outputs:
    - id: setupconfignew
      type: File
      outputBinding:
        glob: "wflow.ini" #The script ensures that we generate the file in the current working directory of wherever it is running
  requirements:
    DockerRequirement:
      dockerPull: potato55/hydromt-test:float
      dockerOutputDirectory: /hydromt # Need to find a workaround for the functionality of this
    ResourceRequirement:
        coresMax: 2
        ramMax: 2048
    NetworkAccess:
      class: NetworkAccess
      networkAccess: true

All of our required files are placed in the /hydromt directory within the docker container. Running the command line tool above works perfectly with cwltool and the output file is generated.

However, when we switched to ades, dockerOutputDirectory had to be removed as it is not supported. This creates an error: PermissionError: [Errno 1] Operation not permitted: '/hydromt' even if all the permissions are setup in the docker container beforehand.

What we need to do is generate the config file so that we can use it in the next step to generate the model (ideally all operations happen in /hydromt.

For reference here is the DockerFile we are using to build our container: https://gist.github.com/jzvolensky/33bb0230546f3926042a5d69c9747a0d

Are there any workarounds to not using dockerOutputDirectory but maintain the permission levels or any general suggestions?

Thanks!

fabricebrito commented 2 months ago

@jzvolensky calrissian's runtime environment being k8s, there are CWL features not supported. This is why runners advocate their conformance with the standard (and its versions).

Further details are found in the code: https://github.com/Duke-GCB/calrissian/blob/master/calrissian/job.py#L482 or in the conformance pages e.g. https://calrissian-cwl.github.io/conformance/1.0.2/

If you need files for the next step, produce a File as a step output that is then used as input to the next step.

jzvolensky commented 2 months ago

Thanks for the reply. Yes, I did see the conformance for various CWL versions before.

For your suggestion, I think that is what we are doing already, no? We want produce a .ini config file which is used in the next step. However, we are not able to write it to access it later. Here is a snippet of the steps:

steps:
      - id: update-config
        in:
          - id: res
            source: res
          - id: precip_fn
            source: precip_fn
        out: [setupconfig] # We produce the file here, which needs to be written but isnt because of permissions
        run: '#update-config'
      - id: build-hydromt
        in:
          - id: region
            source: region
          - id: setupconfig
            source: update-config/setupconfig # We point our next step to use that generated file from the previous step
          - id: catalog
            source: catalog

I will try to modify the whole logic of our CWL and check the scripts we use to generate things to see if we can get around the permission stuff.

fabricebrito commented 2 months ago

Thanks for the reply. Yes, I did see the conformance for various CWL versions before.

For your suggestion, I think that is what we are doing already, no? We want produce a .ini config file which is used in the next step. However, we are not able to write it to access it later. Here is a snippet of the steps:
steps:
      - id: update-config
        in:
          - id: res
            source: res
          - id: precip_fn
            source: precip_fn
        out: [setupconfig] # We produce the file here, which needs to be written but isnt because of permissions
        run: '#update-config'
      - id: build-hydromt
        in:
          - id: region
            source: region
          - id: setupconfig
            source: update-config/setupconfig # We point our next step to use that generated file from the previous step
          - id: catalog
            source: catalog
I will try to modify the whole logic of our CWL and check the scripts we use to generate things to see if we can get around the permission stuff.

Yes, that's the way forward

EOEPCA / proc-ades

DockerOutputDirectory not supported in ADES/Calrissian #41