apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.27k stars 911 forks source link

[Bug] FlinkActionsE2eTest.testDelete is unstable #657

Closed FangYongs closed 1 year ago

FangYongs commented 1 year ago

Search before asking

Paimon version

https://github.com/apache/incubator-paimon/actions/runs/4465175758/jobs/7841980086?pr=656

Error: Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 265.233 s <<< FAILURE! - in org.apache.paimon.tests.FlinkActionsE2eTest Error: testDelete Time elapsed: 110.359 s <<< FAILURE! org.opentest4j.AssertionFailedError: Result is still unexpected after 60 retries. Expected: {2023-01-21, 1, 31=1, 2023-01-20, 1, 28=1, 2023-01-19, 1, 23=1, 2023-01-18, 1, 75=1, 2023-01-17, 1, 50=1} Actual: {2023-01-21, 1, 31=1, 2023-01-14, 0, 19=1, 2023-01-13, 0, 39=1, 2023-01-16, 1, 25=1, 2023-01-15, 0, 37=1, 2023-01-20, 1, 28=1, 2023-01-19, 1, 23=1, 2023-01-18, 1, 75=1, 2023-01-17, 1, 50=1} at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:39) at org.junit.jupiter.api.Assertions.fail(Assertions.java:134) at org.apache.paimon.tests.E2eTestBase.checkResult(E2eTestBase.java:261) at org.apache.paimon.tests.FlinkActionsE2eTest.testDelete(FlinkActionsE2eTest.java:256) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)

Compute Engine

flink

Minimal reproduce step

no

What doesn't meet your expectations?

no

Anything else?

No response

Are you willing to submit a PR?

liyubin117 commented 1 year ago

I also encounted the issue, could you please assign this to me? thanks!

FangYongs commented 1 year ago

Thanks @liyubin117, assigned

liyubin117 commented 1 year ago

I have reproduced the issue, caused by the following exception just in Flink 1.14.6. it seems is a Flink bug.

Caused by: org.apache.flink.util.FlinkException: An OperatorEvent from an OperatorCoordinator to a task was lost. Triggering task failover to ensure consistency. Event: '[NoMoreSplitEvent]', targetTask: Source: paimon-f2b6dae9-74c2-40bc-895e-a616a551f409.default.ts_table -> Calc(select=[dt, k, v], where=[(dt < _UTF-16LE'2023-01-17')]) -> NotNullEnforcer(fields=[dt, k]) -> TableToDataSteam(type=ROW<`dt` STRING NOT NULL, `k` INT NOT NULL, `v` INT> NOT NULL, rowtime=false) -> Map (1/1) - execution #0
    ... 33 more
Caused by: org.apache.flink.runtime.operators.coordination.TaskNotRunningException: Task is not running, but in state FINISHED
    at org.apache.flink.runtime.taskmanager.Task.deliverOperatorEvent(Task.java:1475)
    at org.apache.flink.runtime.taskexecutor.TaskExecutor.sendOperatorEventToTask(TaskExecutor.java:1249)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
    ... 21 more