Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.84k stars 2.94k forks source link

Offsets do not match [received: %d, expected: %d]. #9789

Closed JySongWithZhangCe closed 5 years ago

JySongWithZhangCe commented 5 years ago

Alluxio Version: 1.7.0

Describe the bug I wonder what caused request.getOffset() != mContext.getPosToQueue() . the alluxio source code as described below: /**

madanadit commented 5 years ago

Hi @JySongWithZhangCe, can you post more details please on how to reproduce the issue? Please include the steps and detailed error message on both the worker and client sides.

You could also turn on debug logging to include more information which would be helpful to debug the issue. Thank you

JySongWithZhangCe commented 5 years ago

Hi~Thx for your help~

The steps: sqoop->hdfs->Alluxio->Spark->Alluxio

This issue is hard to reproduce and occurs random. It occurs with some points:

1.users change data in HDFS without chech consistency then read Alluxio 2.Memory usually exceeds 95%

I can't change log level because the alluxio is on produce. Here are some information about my issue: The 1st part is Spark APP abort: Job aborted due to stage failure: Task 106 in stage 11.0 failed 4 times, most recent failure: Lost task 106.3 in stage 11.0 (TID 127, dsszbyz-etl-node61, executor 4-656faa52-2757-4561-a465-d905ac44b136): alluxio.exception.status.DeadlineExceededException: Timeout to read 31075128573953 from [id: 0x0a42e135, L:/xxx/xxx.xxx.xxx.xxx:52195 - R:xxx/xxx.xxx.xxx.xxx:29999]. at alluxio.client.block.stream.NettyPacketReader.readPacket(NettyPacketReader.java:155) at alluxio.client.block.stream.BlockInStream.readPacket(BlockInStream.java:372) at alluxio.client.block.stream.BlockInStream.read(BlockInStream.java:250) at alluxio.client.file.FileInStream.read(FileInStream.java:131) at alluxio.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:118) at java.io.DataInputStream.readFully(DataInputStream.java:206) at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:779) at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:511) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:184) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:132) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:363) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1044) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1019) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:959) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1019) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:722) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:340) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:329) at org.apache.spark.rdd.RDD.iterator(RDD.scala:293) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:329) at org.apache.spark.rdd.RDD.iterator(RDD.scala:293) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:329) at org.apache.spark.rdd.RDD.iterator(RDD.scala:293) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:396) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.lang.Thread.run(Thread.java:795)

The 2nd part is alluxio work abort: 2019-08-26 08:11:17,917 ERROR AbstractWriteHandler - Exception caught in AbstractWriteHandler for channel [id: 0xaaae2d97, L:/xxx.xxx.xxx.xxx:29999 - R:/xxx.xxx.xxx.xxx:53542]: alluxio.exception.status.InvalidArgumentException: Offsets do not match [received: 2259222528, expected: 135790592]. at alluxio.worker.netty.AbstractWriteHandler.validateWriteRequest(AbstractWriteHandler.java:214) at alluxio.worker.netty.AbstractWriteHandler.channelRead(AbstractWriteHandler.java:149) at alluxio.worker.netty.UfsFileWriteHandler.channelRead(UfsFileWriteHandler.java:44) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.AsyncCacheHandler.channelRead(AsyncCacheHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.ShortCircuitBlockWriteHandler.channelRead(ShortCircuitBlockWriteHandler.java:78) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.ShortCircuitBlockReadHandler.channelRead(ShortCircuitBlockReadHandler.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.AbstractWriteHandler.channelRead(AbstractWriteHandler.java:119) at alluxio.worker.netty.BlockWriteHandler.channelRead(BlockWriteHandler.java:38) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.AbstractReadHandler.channelRead(AbstractReadHandler.java:105) at alluxio.worker.netty.BlockReadHandler.channelRead(BlockReadHandler.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.HeartbeatHandler.channelRead(HeartbeatHandler.java:34) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:299) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:415) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1302) 2019-08-26 08:11:17,917 WARN AbstractClient - RPC failed with org.apache.thrift.transport.TTransportException. Retrying. 2019-08-26 08:11:17,918 ERROR AbstractWriteHandler - Exception caught in AbstractWriteHandler for channel [id: 0xaaae2d97, L:/xxx.xxx.xxx.xxx:29999 - R:/xxx.xxx.xxx.xxx:53542]: alluxio.exception.status.InvalidArgumentException: Offsets do not match [received: 2259288064, expected: 135790592]. at alluxio.worker.netty.AbstractWriteHandler.validateWriteRequest(AbstractWriteHandler.java:214) at alluxio.worker.netty.AbstractWriteHandler.channelRead(AbstractWriteHandler.java:149) at alluxio.worker.netty.UfsFileWriteHandler.channelRead(UfsFileWriteHandler.java:44) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.AsyncCacheHandler.channelRead(AsyncCacheHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.ShortCircuitBlockWriteHandler.channelRead(ShortCircuitBlockWriteHandler.java:78) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.ShortCircuitBlockReadHandler.channelRead(ShortCircuitBlockReadHandler.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.AbstractWriteHandler.channelRead(AbstractWriteHandler.java:119) at alluxio.worker.netty.BlockWriteHandler.channelRead(BlockWriteHandler.java:38) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.AbstractReadHandler.channelRead(AbstractReadHandler.java:105) at alluxio.worker.netty.BlockReadHandler.channelRead(BlockReadHandler.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at alluxio.worker.netty.HeartbeatHandler.channelRead(HeartbeatHandler.java:34) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) 2019-08-26 09:38:03,202 ERROR AbstractReadHandler - Failed to run PacketReader. io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 5100273664, max: 5100273664) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:535) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:489) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:766) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:742) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:333) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:183) at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:119) at alluxio.worker.netty.BlockReadHandler$BlockPacketReader.getDataBuffer(BlockReadHandler.java:112) at alluxio.worker.netty.BlockReadHandler$BlockPacketReader.getDataBuffer(BlockReadHandler.java:70) at alluxio.worker.netty.AbstractReadHandler$PacketReader.runInternal(AbstractReadHandler.java:362) at alluxio.worker.netty.AbstractReadHandler$PacketReader.run(AbstractReadHandler.java:329) at alluxio.worker.netty.BlockReadHandler$BlockPacketReader.run(BlockReadHandler.java:70) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

JySongWithZhangCe commented 5 years ago

And I wonder the design reason. Alluxio aready foresee this scene.Thx:)