common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
332 stars 230 forks source link

Integration with GraphML, outreach from Boost, collaborate with Boost and NetworkX #965

Open anadon opened 5 years ago

anadon commented 5 years ago

Hello,

I've been working with Boost Graph, reached out to NetworkX, and am starting a business close to CWL. There is a lot of redundancy among these projects and GraphML (http://graphml.graphdrawing.org/index.html). Graphviz has some inter-compatibility with GraphML, NetworkX has GraphML support, and Boost Graph is kind of close to GraphML support.

NetworkX has a developer interested, I'm doing boost work (though I'm still kind of new there), and I haven't reached out to graphviz yet, but I'm pretty busy with what I have already.

GraphML seems to be the functional core of everything. Can I convince you guys to add some kind of support here? Either by using NetworkX as your backend, Boost Graph Library as your back end, or a GraphML to/from conversion tool? I'm open to other solutions if you have any.

anadon commented 5 years ago

antlr4 is a good tool to make a to/from conversion tool which could very well be relevant to much of what you do.

https://www.antlr.org/about.html

ghost commented 5 years ago

Hi @anadon , are you looking for a way to render a CWL workflow in GraphML format for visualization?

anadon commented 5 years ago

I was thinking something more general purpose. cwl specifies vertex properties and then some details on how to connect them, so if cwl could be converted to GraphML with extra vertex properties and converted from GraphML if it has those properties, it would serve to merge two currently separate development communities and ecosystems. Such a union would doubtless make code more powerful, interoperable, of higher quality, easier to use, and more performant.

For instance, identifying cliques in cwl for running on more local machines probably isn't supported by anybody right now. If everything gets merged then a NetworkX based clustering program could find them and improve execution speed.

Fundamentally, there are CS people off doing their own special things, applied programmers using Boost, data scientists using NetworkX, and processing heavy fields like bioinformatics using cwl. But we're all getting at the same set of problems from different perspectives. Now, I have to make a GraphML to cwl parser anyways for my business and class. While I'm at it, I'd like to bring together the communities I've been running across.

tetron commented 5 years ago

It depends on the problem you're trying to solve? For export, you could try going from CWL to GraphML via RDF. That's how the current graphviz visualization export works. If you really need an isomorphic representation of CWL in GraphML, that sounds like an interesting project...

anadon commented 5 years ago

For the different communities doing interesting things with graphs, we need a singular core representation. Given that the value in unifying our communities is obvious, the best candidate for a universal representation looks to be GraphML.

I opened a ticket with NetworkX, and it looks like they're interested but busy on all of their projects until 2019. On the Boost side, we have a pending PR which is needed to support NetworkX and I'll likely be doing the leg work to add the support they need in BGL in order to use us.

Right now, BGL is fastest and NetworkX has the more powerful interface and algorithms. Neither of us are really applied tools, but tools for other people doing things with tools or core research. cwl could be that applied case. If we unify on this aspect, I could easily see us converging on a C++ backend with python, and R interfaces, and cwl being merged into GraphML years down the road. And this would be without sacrificing users, compatibility, generalizability, or really anything.

anadon commented 5 years ago

I more actionable terms, lets get together and see if bringing out projects closer together is really something we want to do, and if so, how. Would someone from cwl be interested in doing this? I'm willing to do the coordination work.

tetron commented 5 years ago

Given that the value in unifying our communities is obvious

I think it might be more obvious to you than to us. You mentioned compute scheduling optimization? Trying to understand the use cases here. CWL is an orchestration language with concrete execution semantics. It is modeled as a graph, so something like NetworkX might be a useful tool for reasoning about it, but I'm still trying to understand your idea of a grand convergence.

anadon commented 5 years ago

NetworkX is probably more useful for you immediately, but solving problems like achieving uniform path lengths is more of a CS and BGL thing, but would work to reduce the number of machines allocated for a workload while also maintaining constraints on other resources used like disk. Having a C++ API is also useful for speed purposes, but also has many interesting characteristics like FPGA support that aren't ever immediately apparent for their use but are retrospectively valuable.

A github ticket may not be the right place to go over all details, but gauging interest and getting someone willing to spend some time exploring the potentials is what I'm immediately looking for. I'm in no position to impose purist Free Libre Open Source Software and Computer Science ideals upon your project.

tetron commented 5 years ago

@anadon To discuss this further I suggest joining us on gitter (https://gitter.im/common-workflow-language/common-workflow-language) or the weekly video chats https://groups.google.com/forum/#!forum/common-workflow-language-videochat-invites