zeppelin 0.9preview1: Hadoop not installed

garyfeng commented 4 years ago

The zeppelin 0.9preview1 dockerfile does not install hadoop itself, just the flink-hadoop connection. So the Zeppelin notebooks using HDFS don't work.

Need to see whether we try to install as local or use remote clusters.

garyfeng commented 4 years ago

A temporary workaround is to replace hdfs:/// with file:///. This will use the local file system instead of the HDFS ... the example works but you are not using Hadoop.

garyfeng commented 4 years ago

Part of the error was not being able to find the hadoop command -- sounds like we need to install the corresponding Hadoop, specifically v2.8.3, since we are using

# install https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2/2.8.3-10.0/flink-shaded-hadoop-2-2.8.3-10.0.jar
RUN wget -P /opt/flink/lib https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2/2.8.3-10.0/flink-shaded-hadoop-2-2.8.3-10.0.jar

This was not mentioned in https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/hadoop.html

garyfeng commented 4 years ago

Note in the zeppelin setting for Flink, the following is not set by default

HADOOP_CONF_DIR		Location of hadoop conf (core-site.xml, hdfs-site.xml and etc.)

garyfeng commented 4 years ago

see 802403e for adding filesystem and other Flink Connectors

garyfeng commented 4 years ago

error message when using hdfs:///:

data: org.apache.flink.api.scala.DataSet[String] = org.apache.flink.api.scala.DataSet@45fe66e
java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
  at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
  at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:952)
  at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:860)
  at org.apache.flink.api.java.ScalaShellEnvironment.execute(ScalaShellEnvironment.java:81)
  at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:844)
  at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
  at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
  at org.apache.flink.api.scala.DataSet.print(DataSet.scala:1864)
  ... 46 elided
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
  at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
  at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:947)
  ... 52 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
  at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:359)
  at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
  at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
  at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
  at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
  at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:274)
  at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
  at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
  at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
  at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
  at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943)
  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
  ... 3 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error., <Exception on server side:
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
    at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$3(Dispatcher.java:336)
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
    at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    ... 6 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
    at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:152)
    at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
    at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:379)
    at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
    ... 7 more
Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: The given file system URI (hdfs:///tmp/bank.csv) did not describe the authority (like for example HDFS NameNode address/port or S3 host). The attempt to use a configured default authority failed: Hadoop configuration for default file system ('fs.default.name' or 'fs.defaultFS') contains no valid authority component (like hdfs namenode, S3 host, etc)
    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:271)
    at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:807)
    at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:228)
    at org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:255)
    at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:227)
    at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:215)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:120)
    at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:105)
    at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:278)
    at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:266)
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
    at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:146)
    ... 10 more
Caused by: java.io.IOException: The given file system URI (hdfs:///tmp/bank.csv) did not describe the authority (like for example HDFS NameNode address/port or S3 host). The attempt to use a configured default authority failed: Hadoop configuration for default file system ('fs.default.name' or 'fs.defaultFS') contains no valid authority component (like hdfs namenode, S3 host, etc)
    at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:154)
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
    at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:587)
    at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:62)
    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:257)
    ... 22 more

End of exception on server side>]
  at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:390)
  at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:374)
  at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966)
  at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
  ... 4 more
ERROR   
Took 5 sec. Last updated by anonymous at April 11 2020, 6:02:31 PM.

garyfeng commented 4 years ago

https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/hadoop.html

garyfeng commented 4 years ago

https://download.csdn.net/download/RivenDong/12179041

garyfeng commented 4 years ago

But without hdfs, you can't use filesystem using file:// URI. To test file://, you need to use the local version of Flink. This is problematic ... but it's understandable.

garyfeng / flink-sql-demo

zeppelin 0.9preview1: Hadoop not installed #5