ProGraML targets data flow analysis. Although data flow analysis has been a solved problem, ProGraML aims at exploring the capability of ML on those critical data flow analysis tasks. In ProGraML, 5 data flow analysis tasks are targeted. The biggest contribution of ProGraML is the graph representation. The graph representation starts from control flow (e.g., instructions) and then build the data flow of variables. The vertices include types of instruction, variable data type, and constant data type. Notably, the order of operands is modelled in this representation. In addition, call edges are built to connect the function calls. The GNN model is an adaptation of GGNN. Another contribution of this work is the dataset DeeDataFlow, consisting of 256 million lines of LLVM-IR. The evaluation shows that ProGraML is able to beat other models/representations, esp. the representation.
Strength
Proposed a very powerful graph representation and contributed a real-world dataset;
The learning tasks, though a bit old, but is still novel.
Weakness
There should be more downstream tasks evaluated since our ultimate goals are those downstream tasks.
Should list some examples to elaborate the claim "many ML methods cannot replicate even the simplest of the data flow analyses that are critical to making good optimization decisions."
http://proceedings.mlr.press/v139/cummins21a/cummins21a.pdf
Paper Summary
ProGraML targets data flow analysis. Although data flow analysis has been a solved problem, ProGraML aims at exploring the capability of ML on those critical data flow analysis tasks. In ProGraML, 5 data flow analysis tasks are targeted. The biggest contribution of ProGraML is the graph representation. The graph representation starts from control flow (e.g., instructions) and then build the data flow of variables. The vertices include types of instruction, variable data type, and constant data type. Notably, the order of operands is modelled in this representation. In addition, call edges are built to connect the function calls. The GNN model is an adaptation of GGNN. Another contribution of this work is the dataset DeeDataFlow, consisting of 256 million lines of LLVM-IR. The evaluation shows that ProGraML is able to beat other models/representations, esp. the representation.
Strength
Weakness