apache / incubator-streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
https://streampark.apache.org/
Apache License 2.0
3.8k stars 978 forks source link

[Bug] yarn mode stop flink job fail #3397

Open 929359291 opened 7 months ago

929359291 commented 7 months ago

Search before asking

Java Version

11

Scala Version

2.11.x

StreamPark Version

2.11

Flink Version

1.17.1

deploy mode

yarn-application

What happened

streampark stop job fail,yarn still running job, what is problem?

Error Exception

org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "status" (class org.apache.flink.runtime.rest.messages.ErrorResponseBody), not marked as ignorable (one known property: "errors"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: org.apache.flink.runtime.rest.messages.ErrorResponseBody["status"])
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperties(BeanDeserializerBase.java:1650)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:540)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)     
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:533)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:516)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2023-12-14 16:16:11 | ERROR | streampark-deploy-executor-49 | org.apache.streampark.console.core.service.impl.ApplicationServiceImpl:1309] stop flink job fail.
java.util.concurrent.CompletionException: java.lang.reflect.InvocationTargetException
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException: null
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.streampark.flink.client.FlinkClient$.$anonfun$proxy$1(FlinkClient.scala:80)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.$anonfun$proxy$1(FlinkShimsProxy.scala:60)
        at org.apache.streampark.common.util.ClassLoaderUtils$.runAsClassLoader(ClassLoaderUtils.scala:38)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.proxy(FlinkShimsProxy.scala:60)
        at org.apache.streampark.flink.client.FlinkClient$.proxy(FlinkClient.scala:75)
        at org.apache.streampark.flink.client.FlinkClient$.cancel(FlinkClient.scala:53)
        at org.apache.streampark.flink.client.FlinkClient.cancel(FlinkClient.scala)
        at org.apache.streampark.console.core.service.impl.ApplicationServiceImpl.lambda$cancel$6(ApplicationServiceImpl.java:1279)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        ... 3 common frames omitted
Caused by: org.apache.flink.util.FlinkException: [StreamPark] Do CancelRequest for the job afb5268e3f8f2c1d60de1a56e848269d failed. detail: java.util.concurrent.ExecutionException: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancelJob(FlinkClientTrait.scala:527)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancelJob$(FlinkClientTrait.scala:510)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.org$apache$streampark$flink$client$trait$YarnClientTrait$$super$cancelJob(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.$anonfun$doCancel$1(YarnClientTrait.scala:83)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.$anonfun$executeClientAction$1(YarnClientTrait.scala:58)
        at scala.util.Try$.apply(Try.scala:209)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.executeClientAction(YarnClientTrait.scala:58)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel(YarnClientTrait.scala:82)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel$(YarnClientTrait.scala:78)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.doCancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel(FlinkClientTrait.scala:189)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel$(FlinkClientTrait.scala:173)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.cancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.FlinkClientHandler$.cancel(FlinkClientHandler.scala:48)
        at org.apache.streampark.flink.client.FlinkClientHandler.cancel(FlinkClientHandler.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.streampark.flink.client.FlinkClient$.$anonfun$proxy$1(FlinkClient.scala:80)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.$anonfun$proxy$1(FlinkShimsProxy.scala:60)
        at org.apache.streampark.common.util.ClassLoaderUtils$.runAsClassLoader(ClassLoaderUtils.scala:38)
        at org.apache.streampark.flink.proxy.FlinkShimsProxy$.proxy(FlinkShimsProxy.scala:60)
        at org.apache.streampark.flink.client.FlinkClient$.proxy(FlinkClient.scala:75)
        at org.apache.streampark.flink.client.FlinkClient$.cancel(FlinkClient.scala:53)
        at org.apache.streampark.flink.client.FlinkClient.cancel(FlinkClient.scala)
        at org.apache.streampark.console.core.service.impl.ApplicationServiceImpl.lambda$cancel$6(ApplicationServiceImpl.java:1279)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
        at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$6(FutureUtils.java:294)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1085)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        ... 3 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: Response was neither of the expected type([simple type, class org.apache.flink.runtime.rest.handler.async.AsynchronousOperationResult<org.apache.flink.runtime.rest.messages.job.savepoints.SavepointInfo>]) nor an error.
        at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
        at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074)
        ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: Response was neither of the expected type([simple type, class org.apache.flink.runtime.rest.handler.async.AsynchronousOperationResult<org.apache.flink.runtime.rest.messages.job.savepoints.SavepointInfo>]) nor an error.
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:552)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:516)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
        ... 4 more
Caused by: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException: class org.apache.flink.util.SerializedThrowable cannot be cast to class 
org.apache.flink.util.SerializedThrowable (org.apache.flink.util.SerializedThrowable is in unnamed module of loader 'app'; org.apache.flink.util.SerializedThrowable is in unnamed module of loader org.apache.streampark.flink.proxy.ChildFirstClassLoader @4ef4711f) (through reference chain: org.apache.flink.runtime.rest.handler.async.AsynchronousOperationResult["operation"]->org.apache.flink.runtime.rest.messages.job.savepoints.SavepointInfo["failure-cause"])
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:392)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:351)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1821)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:566)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:564)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)     
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2900)
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:525)
        ... 6 more
Caused by: java.lang.ClassCastException: class org.apache.flink.util.SerializedThrowable cannot be cast to class org.apache.flink.util.SerializedThrowable (org.apache.flink.util.SerializedThrowable is in unnamed module of loader 'app'; org.apache.flink.util.SerializedThrowable is in unnamed module of loader org.apache.streampark.flink.proxy.ChildFirstClassLoader @4ef4711f)
        at org.apache.flink.runtime.rest.messages.json.SerializedThrowableDeserializer.deserialize(SerializedThrowableDeserializer.java:49)
        at org.apache.flink.runtime.rest.messages.json.SerializedThrowableDeserializer.deserialize(SerializedThrowableDeserializer.java:34)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
        at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:564)
        ... 20 more

        at org.apache.streampark.flink.client.trait.YarnClientTrait$$anonfun$executeClientAction$2.applyOrElse(YarnClientTrait.scala:63)
        at org.apache.streampark.flink.client.trait.YarnClientTrait$$anonfun$executeClientAction$2.applyOrElse(YarnClientTrait.scala:59)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
        at scala.util.Failure.recover(Try.scala:230)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.executeClientAction(YarnClientTrait.scala:59)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel(YarnClientTrait.scala:82)
        at org.apache.streampark.flink.client.trait.YarnClientTrait.doCancel$(YarnClientTrait.scala:78)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.doCancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel(FlinkClientTrait.scala:189)
        at org.apache.streampark.flink.client.trait.FlinkClientTrait.cancel$(FlinkClientTrait.scala:173)
        at org.apache.streampark.flink.client.impl.YarnApplicationClient$.cancel(YarnApplicationClient.scala:44)
        at org.apache.streampark.flink.client.FlinkClientHandler$.cancel(FlinkClientHandler.scala:48)
        at org.apache.streampark.flink.client.FlinkClientHandler.cancel(FlinkClientHandler.scala)
        ... 16 common frames omitted

Screenshots

image image

Are you willing to submit PR?

Code of Conduct

wolfboys commented 6 months ago

Thanks for your feedback. Can you list the dependencies under flink/lib? Looking forward to your further feedback! 🤗📝

929359291 commented 6 months ago

image @wolfboys hi boy flink/lib

avro-1.11.1.jar flink-avro-1.18.0.jar flink-cep-1.17.1.jar flink-connector-files-1.17.1.jar flink-connector-jdbc-3.1.1-1.17.jar flink-csv-1.17.1.jar flink-dist-1.17.1.jar flink-doris-connector-1.17-1.5.0-SNAPSHOT.jar flink-json-1.17.1.jar flink-metrics-prometheus-1.17.1.jar flink-scala_2.12-1.17.1.jar flink-sql-avro-confluent-registry-1.18.0.jar flink-sql-connector-kafka-1.17.1.jar flink-sql-connector-mysql-cdc-2.4.1.jar flink-table-api-java-uber-1.17.1.jar flink-table-planner-loader-1.17.1.jar flink-table-runtime-1.17.1.jar hologres-connector-flink-1.15-1.3.2-SNAPSHOT-jar-with-dependencies.jar log4j-1.2-api-2.17.1.jar log4j-api-2.17.1.jar log4j-core-2.17.1.jar log4j-slf4j-impl-2.17.1.jar mysql-connector-java-8.0.25.jar

929359291 commented 6 months ago

but update state save/checkpoint dir after,this issue fixed