IntelLabs / pmgd

Persistent Memory Graph Database
MIT License
43 stars 9 forks source link

Persistent Memory Graph Database (PMGD)

:warning: DISCONTINUATION OF PROJECT - This project will no longer be maintained by Intel. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project. If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Recent developments in persistent memory technologies like 3D XPoint promise storage elements providing nearly the speed of DRAM and the durability of block-oriented storage. To provide an efficient storage solution addressing the increasing popularity of connected data and applications that benefit from graph like processing, we have designed and implemented an in-persistent-memory graph database, PMGD, optimized to run on a platform equipped with a vast amount of persistent memory.

Features

In the current release, we have focused our efforts on understanding challenges presented by a PM-based design before considering a case for scaling the database out. Since the current expectations for persistent memory offer the prospect of individual platforms with memory capacity in terabytes, we can still evaluate graph sizes that cover a large spectrum of deployments without scaling out. Hence, PMGD currently supports a single node operation. It is implemented as a library that is linked into an application and as a server that can be accessed by multiple clients. We plan to work on a distributed solution soon.

System Overview

Graphs stored in PMGD consist of nodes (or vertices) optionally connected with edges (or relationships). Graphs may be directed or undirected. PMGD always stores directed edges but its interface is such that direction may be ignored. All nodes need not be connected; a directed graph may be weakly connected (i.e., a path may not exist between all pairs of nodes).

PMGD supports a property graph model with the following features:

These features provide a powerful data model for storing any form of connected data.

We implement add, read, modify, and remove primitives for all entities—nodes, edges, and properties. PMGD supports queries that look up nodes or edges based on (a) a specific tag and/or property, (b) a specific property value, or (c) a property value within a specified range of values. Range lookups are supported for all types of properties except blobs.

PMGD implements indexing to support all of these types of lookups. Users can choose which properties to create indices for, based on the expected query patterns, with an understanding that indices occupy additional memory. PMGD also allows a query to provide a predicate function that can examine the properties or relationships of a node or edge and determine whether it matches the query. Once relevant nodes have been found, PMGD supports graph-oriented queries such as a) get neighbors of a node at n-hops, where n >= 1; b) get all nodes within a neighborhood of up to n-hops from a node; and c) get common neighbors of a set of nodes. Each of these queries includes the ability to specify the direction and tag of edges to follow.

Library sources

The public interface headers are in the include/ folder while the library C++ sources are in src/. The Java bindings are implemented in the java/ folder. Some higher level functionalities like neighbor functions are present in the util/ folder.

Tools

We provide some simple tools like:

Tests and sample code

The test folder has unit tests for a lot of the modules and the tests can be run using the run_all.sh script. clean_all.sh cleans up all the graphs created by run_all.sh. We plan to move our testing to GTEST in future release.