Open afilimonov opened 5 years ago
Thanks for reporting. We're looking into it
Update. The issue seems to be related to dangling file locks in NAS based PVC. When switching to iSCSCI based storage the issue is going away.
Thanks for the update @afilimonov . What's the NAS protocol?
NFS. We use Trident ONTAP-NAS k8s driver.
I have the same issue, and my state dir storage is based on IBM COS.
[2020-07-27 02:47:28,557] INFO stream-thread [_confluent-ksql-dswes-ccpqaquery_CTAS_TRACKING_BATCH_PIPELINE_COLL_0-91f314ff-d130-4edf-86cd-5e96c29a988a-CleanupThread] Deleting obsolete state directory 0_9 for task 0_9 as 2389556ms has elapsed (cleanup delay is 600000ms). (org.apache.kafka.streams.processor.internals.StateDirectory:287)
[2020-07-27 02:47:28,698] ERROR stream-thread [_confluent-ksql-dswes-ccpqaquery_CTAS_TRACKING_BATCH_PIPELINE_COLL_0-91f314ff-d130-4edf-86cd-5e96c29a988a-CleanupThread] Failed to delete the state directory. (org.apache.kafka.streams.processor.internals.StateDirectory:311)
java.nio.file.DirectoryNotEmptyException: /tmp/kafka-streams/_confluent-ksql-dswes-ccpqaquery_CTAS_TRACKING_BATCH_PIPELINE_COLL_0/0_9
at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:769)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:752)
at java.nio.file.Files.walkFileTree(Files.java:2688)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:752)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:301)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:264)
at org.apache.kafka.streams.KafkaStreams.lambda$start$1(KafkaStreams.java:802)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
We see periodic errors deleting obsolete state directory:
Similar issue when trying to terminate running query fails wutg java.nio.file.DirectoryNotEmptyException
KSQL version: 5.3.0 Environment: Kubernetes/Docker using Confluent provided image