G-Research / spark-dgraph-connector

A connector for Apache Spark and PySpark to Dgraph databases.
Apache License 2.0
43 stars 11 forks source link

Add write support to connector #8

Open EnricoMi opened 4 years ago

EnricoMi commented 4 years ago

Spark can write DataFrames to sources in two modes: override (erasing everything first) and append. Appending to (mutating) the Dgraph database would be great.

daveaitel commented 3 years ago

Has there been any progress on this issue?

stackedsax commented 3 years ago

I thought @EnricoMi added write support as he worked through the https://github.com/G-Research/dgraph-dbpedia/ project. But looking at this again, I think it still needs doing.

stackedsax commented 3 years ago

@EnricoMi I know you're off at the moment, but can you clarify whether this support made it in?

EnricoMi commented 3 years ago

@daveaitel @stackedsax write is not support yet and is definitively a bigger piece of work. And I suspect it won't scale nicely, so don't expect huge write performance.

stackedsax commented 3 years ago

Thanks for confirming, Enrico. @daveaitel, what did you have in mind here?

daveaitel commented 3 years ago

Mostly I want to connect my DGraph DB to SPARK, have SPARK run its PageRank/etc algorithms on it, and then update the DGraph database with that information. Is there a better way to do that?

-dave

EnricoMi commented 3 years ago

@daveaitel so that would mean to write / update a single value per node and modifying any edges. That should scale nicely.

Alternative is of course to use the non-scaling traditional pipeline of writing the PageRank scores into a Dgraph compatible RDF file and use the Dgraph live loader. Of course, writing from Spark directly means a much smaller pipeline.

daveaitel commented 3 years ago

Right but from what we are saying in this thread this is not currently possible, because the connection cannot do writes ?

On Sat, Sep 18, 2021, 11:56 PM Enrico Minack @.***> wrote:

@daveaitel https://github.com/daveaitel so that would mean to write / update a single value per node and modifying no edge. That should scale nicely.

Alternative is of course to use the non-scaling traditional pipeline of writing the PageRank scores into a Dgraph compatible RDF file and use the Dgraph live loader. Of course, writing from Spark directly means a much smaller pipeline.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/G-Research/spark-dgraph-connector/issues/8#issuecomment-922425315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE25MYUZIRUJCOTXDMAP4KDUCWCTDANCNFSM4N56U5MA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

EnricoMi commented 3 years ago

That is right, writing to Dgraph from Spark is not supported.