Open edatarch opened 5 years ago
One of the limitations I see on the CSV is that I cannot have varying properties for the vertices in the same file. The CSV load expects the properties to be converted to columns with the value in the different rows. This means I need to group my vertices with the same properties and create separate files for each. This is true for edges as well.
The JSON or Graph structure allows for varying properties. Would the GraphTravelsalSource support this specific use case?
@edatarch With the CSV format the header must include all possible properties, but you can have varying properties for the vertices in the same file. For a given vertex, the properties that are not present can be without a value. There's same sample of the Tinkerpop graph below with different properties per node.
The GraphTraversalSource will support JSON or GraphML loaded via the io() methods.
~id,~label,name:string,lang:string,age:int
1,person,,,29
2,person,vadas,,
3,software,,,
4,person,josh,,32
5,software,ripple,java,
6,person,peter,,35
Direct export to something like GraphSON would be lovely :)
Context: I am exploring a graph database storage solution that looks like the following. AWS Neptune instances service OLTP queries. A daily (?) automated export converts the AWS Neptune graph to GraphSON. The graph in GraphSON form is used as input to a Hadoop-Gremlin graph running on AWS EMR for OLAP queries
At the moment it looks like I could use export-pg
to dump things into csv or json and then do some data transformation on my end to get it into GraphSON. If I could avoid doing that work, that would be awesome :)
@RyanBeatty We can look at the GraphSON format in the future. We also have some other plans that may be relevant to your use case. Happy to discuss in more detail.
@iansrobinson
@beebs-systap Thanks for the quick response! Would definitely appreciate any feedback or ideas for my use case here. I've sent you an email to take discussion off this thread
We plan to support this when we move to Apache TinkerPop 3.4, which deprecates the io() method on the Graph and moves it to the GraphTraversalSource.