Constannnnnt / Distributed-CoreNLP

This infrastructure, built on Stanford CoreNLP, MapReduce and Spark with Java, aims at processing documents annotations at large scale.
https://github.com/Constannnnnt/Distributed-CoreNLP
MIT License
0 stars 0 forks source link

Test Failed: file Not exist on Datasci #6

Closed Constannnnnt closed 5 years ago

Constannnnnt commented 5 years ago
2018-11-13 16:13:31 INFO  StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://name1..xxxx../simpledata;
    at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:715)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:388)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:693)
    at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:732)
    at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:702)
    at ca.uwaterloo.cs651.project.CoreNLP.main(CoreNLP.java:51)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-11-13 16:13:31 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2018-11-13 16:13:31 INFO  AbstractConnector:318 - Stopped Spark@79c4715d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}

The local file cannot be assessed in datasci. Is there any way that we can work around it?

KaisongHuang commented 5 years ago

We need to put [simpledata]() into an HDFS directory. I used the following commands.

hdfs dfs -mkdir -p /user/k86huang/cs651
hdfs dfs -put ./simpledata /user/k86huang/cs651
spark-submit --class ca.uwaterloo.cs651.project.CoreNLP --driver-memory 4G --executor-memory 4G target/project-1.0.jar -input /user/k86huang/cs651/simpledata -output output -functionality dcoref
KaisongHuang commented 5 years ago

And if you don't specify the prefix like "/user/{student_id}", you may find permission issues.