IBM / semanticflowgraph

Semantic flow graphs for data science
Apache License 2.0
28 stars 10 forks source link
open-discovery

Semantic flow graphs

Build Status DOI

Create semantic dataflow graphs of data science code.

Using this package, you can convert data science code to dataflow graphs with semantic content. The package works in tandem with the Data Science Ontology and our language-specific program analysis tools. Currently Python and R are supported.

For more information, please see our research paper on "Teaching machines to understand data science code by semantic enrichment of dataflow graphs".

Command-line interface

We provide a CLI that supports the recording, semantic enrichment, and visualization of flow graphs. To set up the CLI, install this package and add the bin directory to your PATH. Invoke the CLI by running flowgraphs.jl in your terminal.

The CLI includes the following commands:

All the commands take as primary argument either a directory, which is filtered by file extension, or a single file, arbitrarily named.

CLI examples

Record all Python/R scripts in the current directory, yielding raw flow graphs:

flowgraphs.jl record .

Convert a raw flow graph to a semantic flow graph:

flowgraphs.jl enrich my_script.py.graphml --out my_script.graphml

Visualize a semantic flow graph, creating and opening an SVG file:

flowgraphs.jl visualize myscript.graphml --to svg --open