Closed acezen closed 2 months ago
Hi, are there any prerequisites to contribute to this ? i would love to help .. just fyi, i have never contributed to any open source projects yet but i am just fascinated by graphs in general , hence my interest
Hi, are there any prerequisites to contribute to this ? i would love to help .. just fyi, i have never contributed to any open source projects yet but i am just fascinated by graphs in general , hence my interest
Hi @amygbAI, Thanks for your interest in GraphAr! We welcome new contributors with open arms. For a good start, please check out our Getting Started (C++ library) and our Community page for how to join and contribute. If you have any questions, feel free to ask. Looking forward to your contribution!
Hi, have finished with the changes and i am also done with building the project-maven and c++ part of the code. No issues there. Sadly, to test these changes i am unable to find any examples in the documentation ..i can go through the code and figure it out but do you folks have any example files i can use to test this ?
Hi, have finished with the changes and i am also done with building the project-maven and c++ part of the code. No issues there. Sadly, to test these changes i am unable to find any examples in the documentation ..i can go through the code and figure it out but do you folks have any example files i can use to test this ?
Hi, @amygbAI , you can refer to spark the example to generate a json format of Movie graph, and use the data to test you code.
here's what i did so far ..
` name := "testing"
version := "0.1"
scalaVersion := "2.13.12"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.1" `
`import org.apache.spark.sql.SparkSession import org.apache.graphar.graph.GraphWriter
object MainObject { // This is your main object def main(args: Array[String]): Unit = { // connect to the Neo4j instance val spark = SparkSession.builder() .appName("Neo4j to GraphAr for Movie Graph") .config("neo4j.url", "bolt://localhost:7687") .config("neo4j.authentication.type", "basic") .config("neo4j.authentication.basic.username", "neo4j") .config("neo4j.authentication.basic.password", "slayer#666") .config("spark.master", "local") .getOrCreate() // initialize a graph writer val writer: GraphWriter = new GraphWriter()
// put movie graph data into writer
readAndPutDataIntoWriter(writer, spark)
// write in GraphAr format
val outputPath: String = args(0)
val vertexChunkSize: Long = args(1).toLong
val edgeChunkSize: Long = args(2).toLong
val fileType: String = args(3)
writer.write(outputPath, spark, "MovieGraph", vertexChunkSize, edgeChunkSize, fileType)
} } ` and when i "sbt run" it from testing folder i got the errors
[error] /datadrive/GRAPH_AR/incubator-graphar/maven-projects/spark/graphar/testing/src/main/scala/test_170_json_read_write.scala:2:19: object graphar is not a member of package org.apache [error] import org.apache.graphar.graph.GraphWriter
so i thought i might need to go rebuild the scala packages again and went to
incubator-graphar/maven-projects/spark/graphar and ran
mvn -X clean install
and ran into the following errors
[ERROR] Failed to execute goal on project graphar-commons: Could not resolve dependencies for project org.apache.graphar:graphar-commons:jar:0.1.0-SNAPSHOT: Could not find artifact org.apache.graphar:graphar-datasources:jar:0.1.0-SNAPSHOT
so bottomline is that unless i can include the correct jar file i doubt if i will be able to test anything ( and the jar file isn't getting compiled thanks to all the issues above )
my guess is that i am missing something fundamental here
so i thought i might need to go rebuild the scala packages again and went to
Hi, amygbAI, you need to run mvn -X clean install
in the maven-projects/spark
folder, that would compile and install the graphar-datasources
and graphar-commons
package, and you can run the dataset generator like scrip run-neo4j2graphar.sh
thanks and sorry but still getting some errors .. scala version 2.12.10 and jdk8
Run starting. Expected test count is: 21
GraphInfoSuite:
- load graph info *** FAILED ***
java.lang.NullPointerException:
at org.apache.graphar.GraphInfoSuite.$anonfun$new$1(TestGraphInfo.scala:35)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:189)
at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1562)
...
- load vertex info *** FAILED ***
java.lang.NullPointerException:
at org.apache.graphar.GraphInfoSuite.$anonfun$new$2(TestGraphInfo.scala:61)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:189)
at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1562)
...
- load edge info *** FAILED ***
java.lang.NullPointerException:
at org.apache.graphar.GraphInfoSuite.$anonfun$new$8(TestGraphInfo.scala:140)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:189)
at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1562)
...
- == of Property/PropertyGroup/AdjList
TransformExampleSuite:
- transform file type *** FAILED ***
java.lang.NullPointerException:
at org.apache.graphar.TransformExampleSuite.$anonfun$new$1(TransformExample.scala:39)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
..............................
Run completed in 2 seconds, 977 milliseconds.
Total number of tests run: 21
Suites: completed 10, aborted 0
Tests: succeeded 1, failed 20, canceled 0, ignored 0, pending 0
*** 20 TESTS FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for spark 0.1.0-SNAPSHOT:
[INFO]
[INFO] spark .............................................. SUCCESS [ 0.746 s]
[INFO] graphar-datasources ................................ SUCCESS [ 25.872 s]
[INFO] graphar-commons .................................... FAILURE [ 59.063 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:25 min
[INFO] Finished at: 2024-05-20T13:56:23Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.0:test (test) on project graphar-commons: There are test failures -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.0:test (test) on project graphar-commons: There are test failures
how about use mvn clean install -DskipTests -P ${1:-'datasources-32'}
to compile the spark?
or you can refer the action to see how CI build and run the test
Hi, @amygbAI, I have added a helper example to generate testing ldbc sample data from original CSV to graphar, this may help you to generate testing data with json
. Feel free to ask if you have any problem about generating the testing data with the example.
thanks so much for sticking with me on this one 👍 ..was able to test and created the pull request. Though i must point out that Neo4j only works with jdk 17 and 21 ..so to export the example csv and json i had to use some antics by changing the current jdk version and then separately test out the changes. Maybe the creators / maintainers of the project already have this on their roadmap. If its already done, kindly update the document ( which gives us steps to build and test spark folder )
Is your feature request related to a problem? Please describe. GraphAr is a graph file format that supports a variety of payload file formats, including CSV, Parquet, and ORC. However, it does not currently support the HDF5 payload file format. This issue proposes adding support for HDF5 to GraphAr.
JSON
is a lightweight data-interchange format. It is easy for humans to read and write. it's widely use in graph dataset.Describe the solution you'd like For different libraries, we can have different implementation.
C++: since apache arrow now only support read json file, we can only support read json in C++ library. related code: the
FileType
enum: https://github.com/apache/incubator-graphar/blob/b33c1f0f36246fa45761fb5d122f869d18432a7e/cpp/include/gar/fwd.h#L76 https://github.com/apache/incubator-graphar/blob/b33c1f0f36246fa45761fb5d122f869d18432a7e/cpp/src/filesystem.cc#L98-L158 theReadFileToTable
seems use a unify API to read file to table, and may support JSON in fact.Spark: spark support read and write json format, so we can support read/write json in Spark library. related code: the FileType enum: https://github.com/apache/incubator-graphar/blob/b33c1f0f36246fa45761fb5d122f869d18432a7e/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala#L171-L179 https://github.com/apache/incubator-graphar/blob/b33c1f0f36246fa45761fb5d122f869d18432a7e/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala#L92-L102 just add
JSONWriterBuilder
related code as csv/parquet/orc: https://github.com/apache/incubator-graphar/tree/main/spark/datasources-32/src/main/scala/org/apache/graphar/datasourcesAdditional context This issue is a part of issue https://github.com/alibaba/GraphAr/issues/74 and is a good first issue for beginners to get familiar with GraphAr.