Graph databases are no longer just the new kids on the block, but maturity doesn't mean that they can't be a little edgy. Research in data engines can be applied in graph databases, and open-sourced projects like JanusGraph are a great place to do it. Join Ted as he looks into the internals of JanusGraph and consider how the engine can be extended and enhanced with modern day research conjectures and proposals inspired by other database engines and academia.
On-boarding with JanusGraph Performance by Chin Huang and Yi-Hong Wang (IBM)
When approaching a new technology, an upfront evaluation of its performance is necessary. Graph databases support a flexible data model that allows users to easily represent and manage domain specific data. Meanwhile, there are a number of variables in graph modeling and implementation mechanisms that will influence the performance of loading and querying graph data. With one of the latest graph databases available, JanusGraph, we evaluated various graph workloads in order to understand the performance characteristics and to identify system requirements. In this talk, we will share with the audience our performance test approach, the data, schema, tools, and methodology we used. We will also show the results of JanusGraph performance, provide recommendations on achieving better graph performance, and investigate how to apply the same approach to other graph databases.
Building a Graph Data Pipeline by Paul Sterk and George Tretyakov (Ten-x)
Are you thinking about implementing a Graph Database? Are you wondering how to transform your existing datasets into a Graph model? At Ten-X we built a complex, multi-stage Graph Data Pipeline that sources, filters, de-dupes, transforms, loads and manages different sets of data in Janus-Graph. We would like to share some of these insights and hard-earned lessons with you especially in how to deal with poorly documented, complex and dirty legacy datasets. We will talk about a third-party service you can use to greatly ease your ability to de-duplicate any geo-orientated records (such as customer addresses) as well as a compelling data enrichment story. We will also cover approaches for converting data records into vertices and edges, strategies for transforming and creating a graph database ‘load-ready’ dataset, and thoughts on our technology stack (Hadoop, Hive, Spark, TinkerPop, JanusGraph, Cassandra and Elastic Search).
Excellent point, @sumalaika!
Here are a few we should include: