common-workflow-language / cwljava

Java SDK for the Common Workflow Language standards
12 stars 8 forks source link

CWL tool discovery #13

Closed ThilinaManamgoda closed 4 years ago

ThilinaManamgoda commented 8 years ago

How to discover cwl tool in java workflow engine like Apache Taverna .Is YAML parser used to provide CWL file reading capability .Then can i use that to get cwl tool configuration data(metadata).

pgrosu commented 8 years ago

@ThilinaManamgoda I might be mistaken, but I think @stain is implementing something along these lines for Apache Taverna. The Yaml parser I wrote it to be used specifically to generate the dictionary of objects using the draft documents, for subsequently creating the SDK. Based on @tetron recommendation it will be replaced by snakeyaml. So yes you would need to integrate it. It would be good to fully understand the draft documents - and yml files it is based on - as that took me along time to figure out to make the implementation in a way that mirrors it. So practice and play around with it and this code, which will be updated in the next few days to understand the connections among all the pieces.

stain commented 8 years ago

You are right, @pgrosu but I didn't get very far. - @ThilinaManamgoda is a potential GSOC 2016 student who want to help out with Taverna's CWL support, in particular to Browse CWL tool descriptions -- so he would mainly need to inspect the CWL Tool descriptions and annotations. Would using snakeyaml directly be better, or would cwljava be a good start?

pgrosu commented 8 years ago

Hi Stian,

It would not be fair for me to slow Thilina down while we optimize cwljava, so it would good if he uses snakeyaml for now. The plus side is some of his code will probably trickle back to help cwljava when we convert fully to snakeyaml :)

~p

ThilinaManamgoda commented 8 years ago

So it is a good idea to use Snakeyaml right ?

pgrosu commented 8 years ago

I would say yes, since you are just looking at the tool description and annotation, not actually building up a complete CWL object model. I wanted the RDF formatting for something like this to be done before you use it. I am playing with ideas of representation in the object model, but just waiting on confirmation from @tetron on issue #15 that it's okay that this functionality be part of the client and later be pushed in the SDK.

I think that the CWL can be turned into a more natural graph implementation, but that would need to be done at a later time, since users might get confused if it becomes too abstract. As an example - using the linux-sort.cwl example - if the position is a sort-order, then that can be provided at the beginning of inputs: as something as an ordered list, instead of embedding it: $input-order: (#key, #input), then this way users wouldn't need to keep track of the ordering of the arguments. If the list has multiple arguments that can go at the end that could be provided with an OR (|) condition, such as $input-order: (#key, #firstInput | #secondInput | #thirdInput ) There are other things, but that would need to be another post.

Hope it helps, ~p

ThilinaManamgoda commented 8 years ago

Sorry for the late response .Thank you very much for explaining . i will come with more question little bit later .

pgrosu commented 8 years ago

Sure thing, anytime. My advice is to take a small amount of time to fully understand the CWL draft specifications, and let its core concepts dictate your implementation, as it will always be smaller. Basically during your planning/implementation process, don't "fight" against the ideas with more code, as there is a natural dynamic structure behind how they are designed - both in name and definition :)

ThilinaManamgoda commented 8 years ago

Hi paul, So first task is to build the cwl parser . i know that the first thing i should do is understand CWL draft specifications . Any suggestions regarding how i can approach this

stain commented 8 years ago

I would try to build a "hello world" CWL workflow first according to the http://www.commonwl.org/draft-3/UserGuide.html - and then try it out with the cwltool.

Remember you will focus on the tool descriptions and its input/outputs, so then start reading http://www.commonwl.org/draft-3/CommandLineTool.html spec.

pgrosu commented 8 years ago

Hi Thilina,

Let's break this into steps:

  1. So CWL has two main ways of doing some work:

    a) Via a CommandLineTool: Where you can run the a command (i.e. Unix command) based on inputs/outputs b) Via a Workflow: As a series of steps (which could be CommandLineTool(s)) based on inputs/outputs to perform the intended outcome based on inputs.

The two above are detailed in the CWL Specifications: http://www.commonwl.org/draft-3/

And as @stain recommended, the User's Guide provides a nice way to get started, which @tetron kindly put together.

2) When extracting information about a tool, it might be good to understand all the working parts of CWL - though not always necessary, but some CWL documents are quite compact with reference information among multiple files. Thus if you want to learn the working parts of CWL specifications, it is best to use the following four documents to see what is happening after reviewing the above first:

CommandLineTool.yml Workflow.yml metaschema.yml Process.yml

With each file pretend each object that is denoted as "type: record" is a Java class, and with several pieces of paper and a pencil write and draw out the classes - including their connections (i.e. uses, inheritance, specializations, etc.) - starting from CommandLineTool and Workflow. You'll basically end up with the following UML diagram:

cwljava-draftv3

3) Now you might have noticed that CWLJava got updated recently. So you could just parse the CWL files and populate the fields of the appropriate class (i.e. CommandLineTool) and then export it out as YAML via snakeyaml, or you can just everything by hand via snakeyaml - it's up to you which path you prefer :) You might have to update the CWLReader tool under org.commonwl.util to suit your needs, which also uses snakeyaml. I wrote a full demo under examples. If you would like to update it, I would prefer if you first copy that code and write it separately in the same path under a different file name (i.e. CWLReader_Thilina.java). We can then integrate the "best-of-all" into one file in the end, but I would be curious what you come up with first as everyone has different coding approaches :)

So cwljava will provide you with some helpful structure, but if you feel more comfortable with parsing with snakeyaml directly you're welcome do that as well :) The most important point is to be organized about it, as it can become very complex, very quickly without some careful structure and attention to detail. I'm only mentioning this as it took me several months to properly understand the CWL specifications, which led me to write the SDK in this way, and hopefully will help new folks with more quickly connecting all the dots.

Nota bene: Keep in mind that sometimes abstract records are used in some fields, and you will need to traverse to the first concrete instance of that class, which is the first implementation and the true type for that field.

Hope it helps, Paul

ThilinaManamgoda commented 8 years ago

@stain ok i will go through that document :) . @pgrosu Thank you very much :) i will go through these documents and back with more questions :)

mr-c commented 8 years ago

Hello @ThilinaManamgoda , this is all decent advice though if you feel a bit overwhelmed that is normal.

What I suggest is that you only work with pre-processed CWL command line tool descriptions, such as those made by running cwltool --print-pre ${FILENAME.cwl}

Then load the resulting file with snakeyaml either using plain classes or the cwljava classes.

As an example, here is a YAML formatted CommandLineTool

(env) mcrusoe@mrcdev:~/cwltool$ cat ~/common-workflow-language/draft-3/examples/1st.cwl 
cwlVersion: cwl:draft-3
class: CommandLineTool
baseCommand: echo
inputs:
  - id: message
    type: string
    inputBinding:
      position: 1
outputs: []

And its corresponding preprocessed version (JSON formatted):

(env) mcrusoe@mrcdev:~/cwltool$ python cwltool.py --print-pre ~/common-workflow-language/draft-3/examples/1st.cwl 
cwltool.py 1.0.20160428141057
{
    "cwlVersion": "https://w3id.org/cwl/cwl#draft-3", 
    "inputs": [
        {
            "inputBinding": {
                "position": 1
            }, 
            "type": "string", 
            "id": "file:///home/mcrusoe/common-workflow-language/draft-3/examples/1st.cwl#message"
        }
    ], 
    "name": "file:///home/mcrusoe/common-workflow-language/draft-3/examples/1st.cwl", 
    "outputs": [], 
    "baseCommand": "echo", 
    "id": "file:///home/mcrusoe/common-workflow-language/draft-3/examples/1st.cwl", 
    "class": "CommandLineTool"
}
pgrosu commented 8 years ago

@ThilinaManamgoda, I also agree with Michael's (@mr-c) recommended approach.

ThilinaManamgoda commented 8 years ago

@pgrosu now I have clear idea what I have to do, it is figure out the inputs and outputs whether there are single argument or array or array of arrays. Because that's one of the main requirement of Taverna workbench when considering the GUI representation of the CWL tool. Any comments on this ?

pgrosu commented 8 years ago

@ThilinaManamgoda Are you wondering about how to parse a file like array-outputs.cwl, how to create Java classes to populate such information or how to create a GUI to represent dynamic information such as this?

ThilinaManamgoda commented 8 years ago

@pgrosu what i want to know is dynamic these representations are . For example how many ways are there to define a input and also about the record type. here is my blog which will show you what i am doing now http://maanadevgsoc2016.blogspot.com/

pgrosu commented 8 years ago

@ThilinaManamgoda So many moons ago when I first started on writing up a GUI for CWL - before focusing on building the SDK - I noticed that the interface for a CommandLineTool and a Workflow have some differences. In any case, if you only care about inputs and outputs that might not be an issue. Just in any case, below are the definitions which have their own individual nuances:

CommandLineTool
Workflow

Keep in mind that all the steps of a Workflow can have their own inputs and outputs, defined as:

Workflow Step

All the above are already in the CWL Java SDK, so you would just need to parse the CWL Yaml file and populate them if you want to create object out of them. Regarding representing the different variations in a GUI, a tree is one structure that would probably encompass the most complex version of an array<array<...>>, but it would need to be aesthetically pleasing too so as to not become cumbersome for the user to interpret. Again here you have total freedom, but with Java GUIs you can actually create components on the fly. These can show as popup windows, or grow dynamically with the input on the panel you are representing.

Hope it helps, Paul

ThilinaManamgoda commented 8 years ago

@pgrosu thank you very much for explaining .

pgrosu commented 8 years ago

@ThilinaManamgoda Gladly :) I think it is a balance of a bit of a partial planning process for the complexity, with some trial-and-error regarding the visual representation.