PMCC-BioinformaticsCore / janis-core

Core python modules for Janis Pipeline workflow assistant
GNU General Public License v3.0
4 stars 9 forks source link

ExpressionTool implementation for janis #57

Closed beccyl closed 3 years ago

beccyl commented 3 years ago

I understand ExpressionTool implementation for janis is a TODO feature.

I propose a simple use-case that is implementable in CWL, but I am unsure best practice / how to tackle for equivalence in janis.

Overview of workflow: tool1 --> output_files (Array(File)) --> tool2

Problem: some output_files may be "empty" generated by tool1 (it is valid), but tool2 will not accept empty files (throws an error). [Tool2 = ntcard -- There is a docker container in biocontainers, however it is minimal**]

Here is an ExpressionTool to filter empty files before passing to tool2 (CWL provided below). tool1 --> output_files (Array(File)) --> filter-expression-tool --> filtered_output_files(Array(File)) --> tool2

#!/usr/bin/env cwl-runner
class: ExpressionTool
cwlVersion: v1.0

id: filter-empty-files
inputs:
  - id: infiles
    type: 'File[]'
outputs:
  - id: outfiles
    type: 'File[]'
label: filter-empty-files
requirements:
  - class: InlineJavascriptRequirement
expression: |
  ${
      var files = [];
      for (var i = 0; i < inputs.infiles.length; i++) {
        var file = inputs.infiles[i];
        if (file.size > 0) {
          files.push(file);
        }
      }
      return {"outfiles": files};
  }

Further notes about possible solutions I had thought about: ** [ie. another solution could be to use "find" to exclude empty files before passing to ntcard - but the find version in the ntcard biocontainers docker is a busybox implementation - which does not support this feature. I can build my own docker container -- and build my tool to do a find before ntcard.] OR Just build another tool - to call docker container with find ...

Is there some easy implementation in Janis for this (?) Currently the workflow is written in CWL.

illusional commented 3 years ago

Hey @beccyl, we have a PythonTool, which is an equivalent to the expression tool, and would allow you to do this. It gets run in a Python container automatically. Any imports (apart from the

This is what your example might look like (untested code ahead):

from typing import List
from janis_core import File, PythonTool, Array, TOutput

class FilterEmptyFiles(PythonTool):
    @staticmethod
    def code_block(files: List[File]):
        import os
        return {
            "outfiles": [f for f in files if os.stat(f).st_size > 0]
        }

    def outputs(self):
        return [TOutput("outfiles", Array(File))]

if __name__ == "__main__":
    FilterEmptyFiles().translate("wdl")
illusional commented 3 years ago

Closing this as I believe my comment addresses your issue. Feel free to reopen though!