apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.35k stars 2.42k forks source link

[SUPPORT]IllegalStateException: Trying to access closed classloader #7539

Open hbgstc123 opened 1 year ago

hbgstc123 commented 1 year ago

Describe the problem you faced

flink job, stream read from hudi srouce and stream write to hudi sink. this error happen after run 4 hours, cause job to restart.

java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.

To Reproduce

Steps to reproduce the behavior:

  1. flink job, stream read from hudi srouce and stream write to hudi sink
  2. error happen after my job run 4 hours, not sure it can reproduce

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.
    at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
    at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:172)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2366)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2331)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427)
    at org.apache.hadoop.ipc.RPC.getProtocolEngine(RPC.java:209)
    at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:607)
    at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:573)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:546)
    at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.createClientDatanodeProtocolProxy(ClientDatanodeProtocolTranslatorPB.java:187)
    at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.createClientDatanodeProtocolProxy(ClientDatanodeProtocolTranslatorPB.java:178)
    at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.(ClientDatanodeProtocolTranslatorPB.java:127)
    at org.apache.hadoop.hdfs.DFSUtilClient.createClientDatanodeProtocolProxy(DFSUtilClient.java:603)
    at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:323)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:296)
    at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:227)
    at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:211)
    at org.apache.hadoop.hdfs.DFSClient.openInternal(DFSClient.java:1146)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1132)
    at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:351)
    at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:347)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:360)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:919)
    at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:468)
    at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:754)
    at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:305)
    at org.apache.hudi.common.table.timeline.HoodieDefaultTimeline.getInstantDetails(HoodieDefaultTimeline.java:397)
    at org.apache.hudi.hadoop.utils.HoodieInputFormatUtils.getCommitMetadata(HoodieInputFormatUtils.java:517)
    at org.apache.hudi.sink.partitioner.profile.WriteProfiles.getCommitMetadata(WriteProfiles.java:236)
    at org.apache.hudi.source.IncrementalInputSplits.lambda$inputSplits$2(IncrementalInputSplits.java:285)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at org.apache.hudi.source.IncrementalInputSplits.inputSplits(IncrementalInputSplits.java:285)
    at org.apache.hudi.source.StreamReadMonitoringFunction.monitorDirAndForwardSplits(StreamReadMonitoringFunction.java:199)
    at org.apache.hudi.source.StreamReadMonitoringFunction.run(StreamReadMonitoringFunction.java:172)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:128)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:73)
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
yihua commented 1 year ago

@hbgstc123 Thanks for raising the issue. @danny0405 could you provide help here?

xushiyan commented 1 year ago

@hbgstc123 does this happen every few hours or it only happened once so far? can you try upgrading to 0.12.2 and see how it goes?

danny0405 commented 1 year ago

One suggestion is not to use the session cluster, the session cluster mode is fragile for classloader.

hbgstc123 commented 1 year ago

@hbgstc123 does this happen every few hours or it only happened once so far? can you try upgrading to 0.12.2 and see how it goes?

happen every few hours, but after we set this config classloader.check-leaked-classloader = false, it stop to happen

hbgstc123 commented 1 year ago

One suggestion is not to use the session cluster, the session cluster mode is fragile for classloader.

we are using application mode

danny0405 commented 1 year ago

@hbgstc123 does this happen every few hours or it only happened once so far? can you try upgrading to 0.12.2 and see how it goes?

happen every few hours, but after we set this config classloader.check-leaked-classloader = false, it stop to happen

Thanks, seems there are some leak for classloaders, did you use the mor table with async compaction enabled ?

Dwrite commented 2 weeks ago

you can set classloader.check-leaked-classloader: "false" in flink.conf