alibaba / GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
https://graphscope.io
Apache License 2.0
3.32k stars 448 forks source link

[Proposal] APIs for subgraph with projection or filtering #1542

Open yecol opened 2 years ago

yecol commented 2 years ago

Proposal

get edges/vertices as dataframe

# return all edges, if g has only one label of edges, or g is a simple graph
# otherwise, raise an error;
# Compatible with NetworkX;
g.edges()

# return all edges with labeled as `L`
g.edges("L")
g.edges(label="L")

# with properties, if an array return edges dataframe with selected fields
g.edges(label="L", properties=["field_1", "field_2"])
# if an dict, return edges dataframe, with the properties satisfying the value equality.
g.edges(label="L", properties={"gender":"male", "area";"CA"})
# or matching specific src/dst id arrays;
g.edges(label="L", src_ids=[], dst_ids=[] )

# Similar to edges
# alias to g.nodes()
g.vertices()
g.nodes()

graph filtering, generate a new (property) graph

# Vertical selection named project, horizontal selection as subgraph

# if edges/vertices are None, induce a graph with given vertices/edges
g2 = g.project(vertices={"L1":[]})
g3 = g.project(vertices={"L1":[], "L2":["P1", "P2"]})
# new, property value quality

g4 = g.subgraph(vertices={"L1":{"P1":"value1", "P2":"value2"}})

# fn is a lambda func, take a vertex as parameter, true/false as return;
g5 = g.subgraph(vertices={"L1":fn})
# a more complex way supporting predicate and range filtering
g6 = g.subgraph(GREMLIN)

graph transformation

# functions like vmap & emap; 
# fn: a lambda function, 
# input_fields: existing properties
# e.g., maybe we need to compute a pagerank using an existing `degree` field, 
# to generate a new graph with new field as `pr`
g7 =  g.vertices_apply(fn, [active_subset], labels=[])
# array -> dict, with default values.
g8 =  g.vertices_apply(fn, [active_subset], labels=[])

#lambda function for apply_vertex
def vertex_fn(v):
    u = Vertex(v)
    u["xxx"] = v["yyy"] + 1
    return u

# Proposal -2
def vertex_fn_2(v):
   builder = v.get_builder()
   builder["xxx"] =  v["yyy"] +1
   u = builder.seal()
   return u

# remark
# labels=[], apply on all vertex labels;
# if labels are not None, the input_fields and output_fields are given as an array;
# if labels are not None, the input_fields and output_fields are given as an dict, then the vertices without the given input_field will have a default value, in the new field.  

# (deprecated) apply on edges;
# a lambda function fn, will apply on 
# all edges with given labels, if src/dst=None, otherwise applied only on the edges matching the given src/dst array;
# lambda funcion in the form: f(src, e, dst) -> (src, e, dst),
# src_ifs, e_ifs, dst_ifs: input_fields, optional, if none, then all the properties are accessiable;
# src_ofs, e_ofs, dst_ofs: cannot be ALL empty, only the given fields are mutable.
# ifs/ofs can be dicts, with default values. 
# g9 =  g.edges_apply(fn, src=None, dst=None, labels=[], src_ifs=[], e_ifs=[], dst_ifs=[], src_ofs=[], e_ofs=[], dst_ofs=[]) 

g1 = g.graph_apply(fn)

def fn(frag):

    # get a builder of the fragment
    builder = frag.get_builder()
    # builder has the same methods to the frag listed in 
    # https://graphscope.io/docs/reference/cython_sdk.html#cython-sdk-api
    # e.g., builder.get_nodes_num()
    # Initially, the structure & data of builder is same to frag

    # Additional APIs:
    builder.add_column("prop_name")
    builder.add_column("on_label", "prop_name")
    builder.set_property(e, "prop_name", "prop_val")
    builder.set_property(v, "prop_name", "prop_val")

    v  = VertexRequest()
    v.label = "xxx"
    v.property["yyy"] = "zzz"
    builder.add_vertex(v)
    builder.add_vertices(v_df)
    builder.add_edge(e)

    new_frag = builder.seal()
    return new_frag
acezen commented 2 years ago

Compare to NetworkX API:

  1. g.get_edges is similar to g.edges to get edges of g
    • g.edges return a view of edges;
    • g.edges not support Label;
    • g.edges accept bunch of nodes to only get edges of these nodes;
    • g.edges support looks up operation like g.edges[u, v];

I think we can unify the get_edges and g.edges of NetworkX, since label can be ignored in NetworkX Graph, And we can return view in gs graph too with fetch only when use strategy.

  1. g.get_vertices is similar to g.nodes to get vertices of g Similar to g.edges

  2. g.subgraph is similar to g.subgraph of NetworkX to create subgraph of g

    • But subgraph of NeworkX is finer-grained, takes a list of nodes as filter condition;

BTW, I suggest we can provide an api like g.neighbors(v) or g.adj(v) to get the neighbors of vertex, since this is a natural way to iterate edges in algorithm.

g.neighbor(v, Label=None, data=None)
# return neighbors  nodes of v (can select certain edge label neighbors and fetch edge data or not)
g.successors(v, Label=None, data=None)
# return successors  nodes of v
g.predecessors(v, Label=None, data=None)
# return predecessors  nodes of v
yecol commented 2 years ago

Thanks @acezen , revised the proposal based on your comments.

yecol commented 2 years ago
acezen commented 2 years ago

I suggest we can split subgraph to subgraph and edge_subgraph for clarifying the semantic

  1. subgraph for induce vertices subgraph
    
    g4 = g.subgraph(vertices={"L1":{"P1":"value1", "P2":"value2"}})

fn is a lambda func, take a vertex as parameter, true/false as return;

g5 = g.subgraph(vertices={"L1":fn})

2. `edge_subgraph` for induce edges subgraph
```python
g4 = g.edge_subgraph(edges={"L1":{"P1":"value1", "P2":"value2"}})

# fn is a lambda func, take a edges as parameter, true/false as return;
g5 = g.edge_subgraph(edges={"L1":fn})
yecol commented 2 years ago

in the original graph g, exists edges:

v1--e1-->v2;
v3--e2-->v2;

fn applied on edges will lead to conflicts on v2;