By Annemarie Burger
Aim for credible (ideally full) research papers from top venues of DB field, namely SIGMOD, VLDB, ICDE, EDBT, KDD.
Since you like "triangle counting" problem and want to test Gradoop++ with it; the goal of the thesis can be twofold:
1 -- Create a solid prototype Gradoop++ system that does graph stream processing on windows
Structure idea: Windowed edge stream (Flink) -> graph algorithm (Stateful Functions) -> aggregate results + graph sketches (Flink)
Subgoals
1.1 -- StateACC inside the prescribed window (time-based or count-based) accurately maintained. With AL, EL and CSR format.
1.2 -- StateAPPROX for the whole graph stream (or a larger subset than the defined processing window) approximately maintain with graph sketches.
1.3 -- StateACC and StateAPPROX should be incrementally maintained.
2 -- Application of focus: Triangle Counting (TC) Problem
2.1 -- Exact vs Approximate TC
We can support both with Gradoop++ if we have StateACC and StateAPPROX properly implemented.
2.2 -- Centralized vs Distributed
Aim for Distributed papers
2.2.1 Graph State: Check how do they maintain graph state and how do they process it.
Do not focus on each TC algorithm specifics for now!
2.3 -- Static vs Streaming
Aim for Streaming methods.
Check:
2.3.1 Streaming model: What is the streaming model assumed in each paper (e.g. edges are streamed?)
2.3.2 Window Processing: Do they maintain a window for TC inside the window or they count them on unbounded graph streams.
Do not focus on each TC algorithm specifics for now!
For each paper you find/read and is really related please create a google doc that answers the above questions.