Open stevencox opened 6 years ago
Could it be the Scala version? Does Zeppelin have a requirement? This project uses 2.11 because Neo4j has a 2.11 dependency (I use 2.12). But I wonder if Zeppelin is on 2.10.
I suppose that could also be a problem. But it can't even import a class from the json JAR. i.e. things in straight Java land. I'm thinking scala version should not impact that, eh?
All quiet in the interpreter log: /projects/stars/stack/zeppelin/zeppelin-0.7.3-bin-all/logs/zeppelin-interpreter-spark-evryscope-stars-c0.edc.renci.org.log
INFO [2018-02-23 23:01:38,519] ({pool-2-thread-4} Logging.scala[logInfo]:54) - Added JAR /projects/stars/pubmed/pubmed-terms/target/universal/stage/lib/com.google.inject.extensions.guice-assistedinject-4.0.jar at spark://172.25.8.116:35141/jars/com.google.inject.extensions.guice-assistedinject-4.0.jar with timestamp 1519444898519
INFO [2018-02-23 23:01:38,519] ({pool-2-thread-4} SparkInterpreter.java[open]:947) - sc.addJar(/projects/stars/pubmed/pubmed-terms/target/universal/stage/lib/com.google.inject.extensions.guice-assistedinject-4.0.jar)
INFO [2018-02-23 23:01:38,519] ({pool-2-thread-4} Logging.scala[logInfo]:54) - Added JAR /projects/stars/pubmed/pubmed-terms/target/universal/stage/lib/net.sf.trove4j.trove4j-3.0.3.jar at spark://172.25.8.116:35141/jars/net.sf.trove4j.trove4j-3.0.3.jar with timestamp 1519444898519
INFO [2018-02-23 23:01:38,520] ({pool-2-thread-4} SparkInterpreter.java[open]:947) - sc.addJar(/projects/stars/pubmed/pubmed-terms/target/universal/stage/lib/net.sf.trove4j.trove4j-3.0.3.jar)
INFO [2018-02-23 23:01:38,520] ({pool-2-thread-4} Logging.scala[logInfo]:54) - Added JAR /projects/stars/pubmed/pubmed-terms/target/universal/stage/lib/com.typesafe.ssl-config-core_2.11-0.2.2.jar at spark://172.25.8.116:35141/jars/com.typesafe.ssl-config-core_2.11-0.2.2.jar with timestamp 1519444898520
INFO [2018-02-23 23:01:38,521] ({pool-2-thread-4} SparkInterpreter.java[open]:947) - sc.addJar(/projects/stars/pubmed/pubmed-terms/target/universal/stage/lib/com.typesafe.ssl-config-core_2.11-0.2.2.jar)
INFO [2018-02-23 23:01:38,522] ({pool-2-thread-4} SparkInterpreter.java[populateSparkWebUrl]:1013) - Sending metainfos to Zeppelin server: {url=http://172.25.8.116:4040}
INFO [2018-02-23 23:01:38,523] ({Thread-22} Logging.scala[logInfo]:54) - Mesos task 4 is now TASK_RUNNING
INFO [2018-02-23 23:01:38,523] ({Thread-23} Logging.scala[logInfo]:54) - Mesos task 3 is now TASK_RUNNING
INFO [2018-02-23 23:01:38,556] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1519444893072 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter2040526988
INFO [2018-02-23 23:01:40,463] ({dispatcher-event-loop-1} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.25.8.129:38586) with ID 2
Pubmed to RDF code is currently here: https://github.com/balhoff/pubmed-terms
@stevencox should I move to this org, or somewhere else?
@balhoff is annotating PubMed abstracts from Medline with identifiers from a variety of ontologies.
We'd like to parallelize his work using Spark since this should speed up the pipeline substantially.
This notebook should get us there but fails to import classes in JAR files that have been loaded as dependencies. Investigate why this is happening.