dianfu / pyflink-faq

Frequently Asked Questions about PyFlink
Apache License 2.0
24 stars 5 forks source link

Error: "Could not find a free permitted port on the machine. " #2

Closed billyrrr closed 2 years ago

billyrrr commented 2 years ago

Hi, I have encountered the following error when attempting to follow the guide. Could anyone offer some help? Thanks!

2022-06-11 09:01:15,383 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.rest-service.exposed.type, ClusterIP
2022-06-11 09:01:15,383 WARN  org.apache.flink.configuration.GlobalConfiguration           [] - Error while trying to split key and value in configuration file /opt/flink/conf/flink-conf.yaml:12: "pipeline.classpaths: "
2022-06-11 09:01:15,384 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: $internal.application.main, org.apache.flink.client.python.PythonDriver
2022-06-11 09:01:15,384 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1024m
2022-06-11 09:01:15,384 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
2022-06-11 09:01:15,384 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.savepoint-restore-mode, NO_CLAIM
2022-06-11 09:01:15,384 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.pod-template-file, /workspaces/flink-deploy/pod-template-file.yaml
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.target, kubernetes-application
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1024m
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.rpc.port, 6122
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.attached, true
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
2022-06-11 09:01:15,385 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.shutdown-on-attached-exit, false
2022-06-11 09:01:15,385 WARN  org.apache.flink.configuration.GlobalConfiguration           [] - Error while trying to split key and value in configuration file /opt/flink/conf/flink-conf.yaml:25: "pipeline.jars: "
2022-06-11 09:01:15,386 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.address, 0.0.0.0
2022-06-11 09:01:15,757 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting KubernetesApplicationClusterEntrypoint.
2022-06-11 09:01:15,768 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install default filesystem.
2022-06-11 09:01:15,771 INFO  org.apache.flink.core.fs.FileSystem                          [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2022-06-11 09:01:15,880 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Install security context.
2022-06-11 09:01:15,887 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2022-06-11 09:01:15,890 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-5402258572251190160.conf.
2022-06-11 09:01:15,957 INFO  org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2022-06-11 09:01:15,959 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Initializing cluster services.
2022-06-11 09:01:15,962 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Using working directory: WorkingDirectory(/tmp/jm_88a1503bc2803d9ed84357f4722b23a5).
2022-06-11 09:01:16,382 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address duocheng-testing.default:6123, bind address 0.0.0.0:6123.
2022-06-11 09:01:17,257 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2022-06-11 09:01:17,277 INFO  akka.remote.RemoteActorRefProvider                           [] - Akka Cluster not in use - enabling unsafe features anyway because `akka.remote.use-unsafe-remote-features-outside-cluster` has been enabled.
2022-06-11 09:01:17,277 INFO  akka.remote.Remoting                                         [] - Starting remoting
2022-06-11 09:01:17,473 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink@duocheng-testing.default:6123]
2022-06-11 09:01:17,676 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink@duocheng-testing.default:6123
2022-06-11 09:01:17,758 INFO  org.apache.flink.runtime.blob.BlobServer                     [] - Created BLOB server storage directory /tmp/jm_88a1503bc2803d9ed84357f4722b23a5/blobStorage
2022-06-11 09:01:17,760 INFO  org.apache.flink.runtime.blob.BlobServer                     [] - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000
2022-06-11 09:01:17,767 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - No metrics reporter configured, no metrics will be exposed/reported.
2022-06-11 09:01:17,770 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address duocheng-testing.default:0, bind address 0.0.0.0:0.
2022-06-11 09:01:17,781 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2022-06-11 09:01:17,783 INFO  akka.remote.RemoteActorRefProvider                           [] - Akka Cluster not in use - enabling unsafe features anyway because `akka.remote.use-unsafe-remote-features-outside-cluster` has been enabled.
2022-06-11 09:01:17,784 INFO  akka.remote.Remoting                                         [] - Starting remoting
2022-06-11 09:01:17,789 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@duocheng-testing.default:38621]
2022-06-11 09:01:17,859 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink-metrics@duocheng-testing.default:38621
2022-06-11 09:01:17,868 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService .
2022-06-11 09:01:17,968 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Upload directory /tmp/flink-web-0491f966-b277-4a41-a012-d0c333bf10a6/flink-web-upload does not exist. 
2022-06-11 09:01:17,968 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Created directory /tmp/flink-web-0491f966-b277-4a41-a012-d0c333bf10a6/flink-web-upload for file uploads.
2022-06-11 09:01:17,970 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Starting rest endpoint.
2022-06-11 09:01:18,106 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils          [] - Determined location of main cluster component log file: /opt/flink/log/flink--kubernetes-application-0-duocheng-testing-66f4dd48dd-rtd2g.log
2022-06-11 09:01:18,106 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils          [] - Determined location of main cluster component stdout file: /opt/flink/log/flink--kubernetes-application-0-duocheng-testing-66f4dd48dd-rtd2g.out
2022-06-11 09:01:18,360 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Rest endpoint listening at 0.0.0.0:8081
2022-06-11 09:01:18,361 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - http://0.0.0.0:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2022-06-11 09:01:18,362 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Web frontend listening at http://0.0.0.0:8081.
2022-06-11 09:01:18,369 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2022-06-11 09:01:18,370 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction network memory (57.600mb (60397978 bytes)) is less than its min value 64.000mb (67108864 bytes), min value will be used instead
2022-06-11 09:01:18,384 INFO  org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - DefaultDispatcherRunner was granted leadership with leader id 00000000-0000-0000-0000-000000000000. Creating new DispatcherLeaderProcess.
2022-06-11 09:01:18,388 INFO  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess.
2022-06-11 09:01:18,390 INFO  org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - Starting resource manager service.
2022-06-11 09:01:18,390 INFO  org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - Resource manager service is granted leadership with session id 00000000-0000-0000-0000-000000000000.
2022-06-11 09:01:18,455 INFO  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs that are not finished, yet.
2022-06-11 09:01:18,456 INFO  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs.
2022-06-11 09:01:18,469 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_0 .
2022-06-11 09:01:18,568 INFO  org.apache.flink.client.ClientUtils                          [] - Starting program (detached: false)
2022-06-11 09:01:18,762 WARN  org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap [] - Application failed unexpectedly: 
java.util.concurrent.CompletionException: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application.
    at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) ~[flink-dist-1.15.0.jar:1.15.0]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) ~[flink-dist-1.15.0.jar:1.15.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
    at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
    at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) ~[flink-rpc-akka_9253bb0c-0b4c-42d8-809e-529da37d5f82.jar:1.15.0]
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_9253bb0c-0b4c-42d8-809e-529da37d5f82.jar:1.15.0]
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) ~[flink-rpc-akka_9253bb0c-0b4c-42d8-809e-529da37d5f82.jar:1.15.0]
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) [flink-rpc-akka_9253bb0c-0b4c-42d8-809e-529da37d5f82.jar:1.15.0]
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [flink-rpc-akka_9253bb0c-0b4c-42d8-809e-529da37d5f82.jar:1.15.0]
    at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinPool.scan(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) [?:?]
Caused by: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application.
    ... 14 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.lang.RuntimeException: Could not find a free permitted port on the machine.
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.0.jar:1.15.0]
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0.jar:1.15.0]
    at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0.jar:1.15.0]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0.jar:1.15.0]
    ... 13 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could not find a free permitted port on the machine.
billyrrr commented 2 years ago

Update: this appears to be a privilege issue. It also occurs when running standalone cluster with docker-entrypoint.sh. My temporary workaround is to skip docker-entrypoint.sh and execute the flink binary with root privilege. (I would definitely not recommend this to anyone running production environment).