Dynamic access to graph data

thinxer commented 11 years ago

We need to add some meta information for the user defined data types in the graph. This will enable better interoperability for many tools without sacrificing run-time speed. For example, it's now not impossible to print/examine a user generated graph (we call it a foreign graph, since we are not familiar with it) without first having user's header files and compile the program with the headers. It's also not possible to filter and select a subgraph from a foreign graph.

To achieve this, we should embed a meta info into our graph. A simple solution would be that we force user to supply the type info when building the graph. The type info can be a list of (name, type, offset, size) tuples[1]. With this information, we can enumerate the fields of a foreign graph and use it to access the graph.

Of course, we can add more fields such as description, indexable etc to the type descriptor, which can be more friendly to end users.

@pondering, @neozhangthe1 What do you guys think about this?

[1]: The last two elements in the tuple are not necessary. It's just being verbose.

wweic commented 11 years ago

If we have to implement heterogeneous graph, meta-info is needed. The only issue is how to make it less messier. I think the less user write, the better. They can write meta-info of vertex_data_type and edge_data_type in a markup language like .yml. Then we can parse that file to interprete foreign graph.

neozhangthe1 commented 11 years ago

@pondering I think define data_type in markup language is a good idea

thinxer commented 11 years ago

We can even provide a tool to parse user's source code and generate the necessary meta info.

For now, let's save our parsing code and let user define their type with some GraphBuilder's API.

If we use some external markup language, we have to include codes to parse them. I prefer to minimize external dependencies.

Some codes like this:

struct Foo {
    int type;
}

TypeMeta foo_meta = builder.newType<Foo>();
foo_meta.appendField("type", FIELD_TYPE_INT);
builder.setVertexType<Foo>();

If we design this API carefully, there shouldn't be too much over typing.

wweic commented 11 years ago

OK. First ship a beta version.

I can think of 3 places the API works in:

building graph. User describe data type(name, type) using our API. how to support vector, map, user-defined struct.
saving graph. Save the meta information with vertex_data file and edge_data file. record each vertex_data's type.
loading graph. Parse graph from disk. restore the exact type to vertex program and how does vertex program know the type of current vertex_data

thinxer commented 11 years ago

VertexPrograms still need header files to use the graph. If not, they need to access the fields of vertex/edge data dynamically, which may impose a performance issue. In some cases, this can be tolerated, though.

I still have no idea on have to support strings or other dynamically allocated objects. If we need some packing/unpacking procedure to access the data, we lose the advantages mmap has brought us.

I'll start with providing field infos for current static graphs.

thinxer commented 11 years ago

How about keeping the edge list part memory mapped, while use serialization or some magic to support dynamic user data?

Another idea is that we design our own implementations for vectors/maps, and make these types can be accessed immediately after the mmap.

thinxer commented 11 years ago

For example, STL with a custom allocator may be a solution.

THUKEG / saedb

Dynamic access to graph data #45