apache / incubator-streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
https://streampark.apache.org/
Apache License 2.0
3.91k stars 1.01k forks source link

[Bug] yarn mode stop flink job fail #3397

Open 929359291 opened 11 months ago

929359291 commented 11 months ago

Search before asking

Java Version

11

Scala Version

2.11.x

StreamPark Version

2.11

Flink Version

1.17.1

deploy mode

yarn-application

What happened

streampark stop job fail,yarn still running job, what is problem?

Error Exception

org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "status" (class org.apache.flink.runtime.rest.messages.ErrorResponseBody), not marked as ignorable (one known property: "errors"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: org.apache.flink.runtime.rest.messages.ErrorResponseBody["status"])
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperties(BeanDeserializerBase.java:1650)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:540)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)     
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:533)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:516)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2023-12-14 16:16:11 | ERROR | streampark-deploy-executor-49 | org.apache.streampark.console.core.service.impl.ApplicationServiceImpl:1309] stop flink job fail.
java.util.concurrent.CompletionException: java.lang.reflect.InvocationTargetException
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException: null
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.streampark.flink.client.FlinkClient$.$anonfun$proxy$1(FlinkClient.scala:80)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.$anonfun$proxy$1(FlinkShimsProxy.scala:60)
        at org.apache.streampark.common.util.ClassLoaderUtils$.runAsClassLoader(ClassLoaderUtils.scala:38)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.proxy(FlinkShimsProxy.scala:60)
        at org.apache.streampark.flink.client.FlinkClient$.proxy(FlinkClient.scala:75)
        at org.apache.streampark.flink.client.FlinkClient$.cancel(FlinkClient.scala:53)
        at org.apache.streampark.flink.client.FlinkClient.cancel(FlinkClient.scala)
        at org.apache.streampark.console.core.service.impl.ApplicationServiceImpl.lambda$cancel$6(ApplicationServiceImpl.java:1279)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        ... 3 common frames omitted
Caused by: org.apache.flink.util.FlinkException: [StreamPark] Do CancelRequest for the job afb5268e3f8f2c1d60de1a56e848269d failed. detail: java.util.concurrent.ExecutionException: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancelJob(FlinkClientTrait.scala:527)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancelJob$(FlinkClientTrait.scala:510)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.org$apache$streampark$flink$client$trait$YarnClientTrait$$super$cancelJob(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.$anonfun$doCancel$1(YarnClientTrait.scala:83)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.$anonfun$executeClientAction$1(YarnClientTrait.scala:58)
        at scala.util.Try$.apply(Try.scala:209)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.executeClientAction(YarnClientTrait.scala:58)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel(YarnClientTrait.scala:82)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel$(YarnClientTrait.scala:78)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.doCancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel(FlinkClientTrait.scala:189)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel$(FlinkClientTrait.scala:173)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.cancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.FlinkClientHandler$.cancel(FlinkClientHandler.scala:48)
        at org.apache.streampark.flink.client.FlinkClientHandler.cancel(FlinkClientHandler.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.streampark.flink.client.FlinkClient$.$anonfun$proxy$1(FlinkClient.scala:80)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.$anonfun$proxy$1(FlinkShimsProxy.scala:60)
        at org.apache.streampark.common.util.ClassLoaderUtils$.runAsClassLoader(ClassLoaderUtils.scala:38)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.proxy(FlinkShimsProxy.scala:60)
        at org.apache.streampark.flink.client.FlinkClient$.proxy(FlinkClient.scala:75)
        at org.apache.streampark.flink.client.FlinkClient$.cancel(FlinkClient.scala:53)
        at org.apache.streampark.flink.client.FlinkClient.cancel(FlinkClient.scala)
        at org.apache.streampark.console.core.service.impl.ApplicationServiceImpl.lambda$cancel$6(ApplicationServiceImpl.java:1279)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
        at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$6(FutureUtils.java:294)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1085)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        ... 3 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: Response was neither of the expected type([simple type, class org.apache.flink.runtime.rest.handler.async.AsynchronousOperationResult<org.apache.flink.runtime.rest.messages.job.savepoints.SavepointInfo>]) nor an error.
        at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
        at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
        ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: Response was neither of the expected type([simple type, class org.apache.flink.runtime.rest.handler.async.AsynchronousOperationResult<org.apache.flink.runtime.rest.messages.job.savepoints.SavepointInfo>]) nor an error.
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:552)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:516)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
        ... 4 more
Caused by: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException: class org.apache.flink.util.SerializedThrowable cannot be cast to class 
org.apache.flink.util.SerializedThrowable (org.apache.flink.util.SerializedThrowable is in unnamed module of loader 'app'; org.apache.flink.util.SerializedThrowable is in unnamed module of loader org.apache.streampark.flink.proxy.ChildFirstClassLoader @4ef4711f) (through reference chain: org.apache.flink.runtime.rest.handler.async.AsynchronousOperationResult["operation"]->org.apache.flink.runtime.rest.messages.job.savepoints.SavepointInfo["failure-cause"])
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:392)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:351)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1821)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:566)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:564)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)     
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2900)
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:525)
        ... 6 more
Caused by: java.lang.ClassCastException: class org.apache.flink.util.SerializedThrowable cannot be cast to class org.apache.flink.util.SerializedThrowable (org.apache.flink.util.SerializedThrowable is in unnamed module of loader 'app'; org.apache.flink.util.SerializedThrowable is in unnamed module of loader org.apache.streampark.flink.proxy.ChildFirstClassLoader @4ef4711f)
        at org.apache.flink.runtime.rest.messages.json.SerializedThrowableDeserializer.deserialize(SerializedThrowableDeserializer.java:49)
        at org.apache.flink.runtime.rest.messages.json.SerializedThrowableDeserializer.deserialize(SerializedThrowableDeserializer.java:34)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:564)
        ... 20 more

        at org.apache.streampark.flink.client.trait.YarnClientTrait$$anonfun$executeClientAction$2.applyOrElse(YarnClientTrait.scala:63)
        at org.apache.streampark.flink.client.trait.YarnClientTrait$$anonfun$executeClientAction$2.applyOrElse(YarnClientTrait.scala:59)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
        at scala.util.Failure.recover(Try.scala:230)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.executeClientAction(YarnClientTrait.scala:59)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel(YarnClientTrait.scala:82)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel$(YarnClientTrait.scala:78)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.doCancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel(FlinkClientTrait.scala:189)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel$(FlinkClientTrait.scala:173)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.cancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.FlinkClientHandler$.cancel(FlinkClientHandler.scala:48)
        at org.apache.streampark.flink.client.FlinkClientHandler.cancel(FlinkClientHandler.scala)
        ... 16 common frames omitted

Screenshots

image image

Are you willing to submit PR?

Code of Conduct

wolfboys commented 11 months ago

Thanks for your feedback. Can you list the dependencies under flink/lib? Looking forward to your further feedback! 🤗📝

929359291 commented 11 months ago

image @wolfboys hi boy flink/lib

avro-1.11.1.jar flink-avro-1.18.0.jar flink-cep-1.17.1.jar flink-connector-files-1.17.1.jar flink-connector-jdbc-3.1.1-1.17.jar flink-csv-1.17.1.jar flink-dist-1.17.1.jar flink-doris-connector-1.17-1.5.0-SNAPSHOT.jar flink-json-1.17.1.jar flink-metrics-prometheus-1.17.1.jar flink-scala_2.12-1.17.1.jar flink-sql-avro-confluent-registry-1.18.0.jar flink-sql-connector-kafka-1.17.1.jar flink-sql-connector-mysql-cdc-2.4.1.jar flink-table-api-java-uber-1.17.1.jar flink-table-planner-loader-1.17.1.jar flink-table-runtime-1.17.1.jar hologres-connector-flink-1.15-1.3.2-SNAPSHOT-jar-with-dependencies.jar log4j-1.2-api-2.17.1.jar log4j-api-2.17.1.jar log4j-core-2.17.1.jar log4j-slf4j-impl-2.17.1.jar mysql-connector-java-8.0.25.jar

929359291 commented 11 months ago

but update state save/checkpoint dir after,this issue fixed