TheDataRideAlongs / ProjectDomino

Scaling COVID public behavior change and anti-misinformation
Apache License 2.0
61 stars 13 forks source link

Dynamic Neo4j large ~parquet export #4

Open lmeyerov opened 4 years ago

lmeyerov commented 4 years ago

Tracks current effort to get Neo4j to export ~100M node/edge parquet/arrow graphs in decent time for use by analytics stacks

This is for fast on-the-fly mode: dynamic cypher query -> parquet/arrow

vilkinsons commented 3 years ago

If there's a WIP branch anywhere I'd be keen to take a look. Unsure if this piece was ever started

lmeyerov commented 3 years ago

Someone at Neo4j started this but I don't think they made progress

I'm not sure of the current state in Neo4j land. One of my guesses was, due to Neo4j's Spark connector (cypher query ->N eo4j -> Spark RDD?), there might be a typed bulk exporter, and as Spark is already Arrow-friendly, we can coopt it