awslabs / amazon-neptune-tools

Tools and utilities to enable loading data and building graph applications with Amazon Neptune.
Apache License 2.0
297 stars 151 forks source link

add import directly from graphml and json without transforming to csv first. #27

Open edatarch opened 5 years ago

beebs-systap commented 5 years ago

We plan to support this when we move to Apache TinkerPop 3.4, which deprecates the io() method on the Graph and moves it to the GraphTraversalSource.

edatarch commented 5 years ago

One of the limitations I see on the CSV is that I cannot have varying properties for the vertices in the same file. The CSV load expects the properties to be converted to columns with the value in the different rows. This means I need to group my vertices with the same properties and create separate files for each. This is true for edges as well.

The JSON or Graph structure allows for varying properties. Would the GraphTravelsalSource support this specific use case?

beebs-systap commented 5 years ago

@edatarch With the CSV format the header must include all possible properties, but you can have varying properties for the vertices in the same file. For a given vertex, the properties that are not present can be without a value. There's same sample of the Tinkerpop graph below with different properties per node.

The GraphTraversalSource will support JSON or GraphML loaded via the io() methods.

~id,~label,name:string,lang:string,age:int
1,person,,,29
2,person,vadas,,
3,software,,,
4,person,josh,,32
5,software,ripple,java,
6,person,peter,,35
RyanBeatty commented 5 years ago

Direct export to something like GraphSON would be lovely :)

Context: I am exploring a graph database storage solution that looks like the following. AWS Neptune instances service OLTP queries. A daily (?) automated export converts the AWS Neptune graph to GraphSON. The graph in GraphSON form is used as input to a Hadoop-Gremlin graph running on AWS EMR for OLAP queries

At the moment it looks like I could use export-pg to dump things into csv or json and then do some data transformation on my end to get it into GraphSON. If I could avoid doing that work, that would be awesome :)

beebs-systap commented 5 years ago

@RyanBeatty We can look at the GraphSON format in the future. We also have some other plans that may be relevant to your use case. Happy to discuss in more detail.

@iansrobinson

RyanBeatty commented 5 years ago

@beebs-systap Thanks for the quick response! Would definitely appreciate any feedback or ideas for my use case here. I've sent you an email to take discussion off this thread