apache / linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
https://linkis.apache.org/
Apache License 2.0
3.3k stars 1.17k forks source link

linkis-cg-entrance's FileSystem Close Exception #697

Closed geosmart closed 3 years ago

geosmart commented 3 years ago

when run com.webank.wedatasphere.linkis.ujes.client.LinkisClientTest,

exception throw when run code String resultSet = jobInfo.getResultSetList(client)[0],

the linkis-cg-entrance throw a exception about filesystem;

error debug: image

entrance error log

2021-03-30 16:48:30.002 [WARN ] [linkisClient-Test_geosmart_ops_sparkConsumerThread] c.w.w.l.e.l.CacheLogManager (55) [apply] - 
write log for job linkisClient-Test_geosmart_ops_spark_6 failed java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : 
local host is: "host1"; 
destination host is: "host2":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client.call(Client.java:1474) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[hadoop-common-2.6.5.jar:?]
        at com.sun.proxy.$Proxy232.getFileInfo(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) ~[hadoop-hdfs-2.6.5.jar:?]
        at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_275]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_275]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[hadoop-common-2.6.5.jar:?]
        at com.sun.proxy.$Proxy233.getFileInfo(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977) ~[hadoop-hdfs-2.6.5.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118) ~[hadoop-hdfs-2.6.5.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114) ~[hadoop-hdfs-2.6.5.jar:?]
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114) ~[hadoop-hdfs-2.6.5.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) ~[hadoop-common-2.6.5.jar:?]
        at com.webank.wedatasphere.linkis.storage.fs.impl.HDFSFileSystem.exists(HDFSFileSystem.java:278) ~[linkis-storage-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.storage.utils.FileSystemUtils$$anonfun$createNewFile$4.apply(FileSystemUtils.scala:68) ~[linkis-storage-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryFinally(Utils.scala:62) ~[linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.storage.utils.FileSystemUtils$.createNewFile(FileSystemUtils.scala:80) ~[linkis-storage-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.AbstractLogWriter.<init>(LogWriter.scala:83) ~[linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.CacheLogWriter.<init>(CacheLogWriter.scala:31) ~[linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.CacheLogManager.createLogWriter(CacheLogManager.scala:62) ~[linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.LogManager$$anonfun$onLogUpdate$1.apply$mcV$sp(LogManager.scala:45) ~[linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.LogManager$$anonfun$onLogUpdate$1.apply(LogManager.scala:42) ~[linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.LogManager$$anonfun$onLogUpdate$1.apply(LogManager.scala:42) ~[linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) [linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.log.LogManager.onLogUpdate(LogManager.scala:54) [linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.execute.EntranceJob$$anonfun$afterStateChanged$6.apply(EntranceJob.scala:141) [linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.execute.EntranceJob$$anonfun$afterStateChanged$6.apply(EntranceJob.scala:141) [linkis-entrance-1.0.0-RC1.jar:?]
        at scala.Option.foreach(Option.scala:257) [scala-library-2.11.12.jar:?]
        at com.webank.wedatasphere.linkis.entrance.execute.EntranceJob.afterStateChanged(EntranceJob.scala:141) [linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.SchedulerEvent$class.transition(SchedulerEvent.scala:82) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.Job.transition(Job.scala:38) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.Job$$anonfun$transitionCompleted$3.apply$mcV$sp(Job.scala:215) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.Job$$anonfun$transitionCompleted$3.apply(Job.scala:215) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.Job$$anonfun$transitionCompleted$3.apply(Job.scala:215) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) [linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarnMsg(Utils.scala:88) [linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.Job.transitionCompleted(Job.scala:215) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.execute.EntranceJob.transitionCompleted(EntranceJob.scala:164) [linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.Job.onFailure(Job.scala:117) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.entrance.execute.EntranceJob.onFailure(EntranceJob.scala:154) [linkis-entrance-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$loop$3$$anonfun$apply$10.apply(FIFOUserConsumer.scala:150) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$loop$3$$anonfun$apply$10.apply(FIFOUserConsumer.scala:144) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:54) [linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$loop$3.apply(FIFOUserConsumer.scala:144) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$loop$3.apply(FIFOUserConsumer.scala:117) [linkis-scheduler-1.0.0-RC1.jar:?]
        at scala.Option.foreach(Option.scala:257) [scala-library-2.11.12.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer.loop(FIFOUserConsumer.scala:117) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$run$1.apply$mcV$sp(FIFOUserConsumer.scala:82) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$run$1.apply(FIFOUserConsumer.scala:82) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer$$anonfun$run$1.apply(FIFOUserConsumer.scala:82) [linkis-scheduler-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) [linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndError(Utils.scala:102) [linkis-common-1.0.0-RC1.jar:?]
        at com.webank.wedatasphere.linkis.scheduler.queue.fifoqueue.FIFOUserConsumer.run(FIFOUserConsumer.scala:82) [linkis-scheduler-1.0.0-RC1.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_275]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
Caused by: java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[?:1.8.0_275]
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) ~[?:1.8.0_275]
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hadoop.ipc.Client.call(Client.java:1440) ~[hadoop-common-2.6.5.jar:?]
        ... 61 more

2021-03-30 16:24:29.691 [INFO ] [linkisClient-Test_geosmart_ops_sparkConsumerThread] c.w.w.l.e.j.EntranceExecutionJob (243) [close] - job:linkisClient-Test_geosmart_ops_spark_4 is closing

how to fix this,and I want to know, when does the HDFSFileSystem Close, and when does it open?

geosmart commented 3 years ago

manualy disable cache solve this hadoopConf.setBoolean ("fs.hdfs.impl.disable.cache", true);