filodb / FiloDB

Distributed Prometheus time series database
Apache License 2.0
1.43k stars 227 forks source link

batch size too large #60

Closed alexander-branevskiy closed 8 years ago

alexander-branevskiy commented 8 years ago

Hi guys! Working with your project i faced off with problem that cassandra throws exception trying handle too large batch. How can i configure it (mb decrease batch size or sth else)? I havn't found any examples and spending a lot of time with your source has no affect. I solved this problem configuring cassandra config (param : batch_size_fail_threshold_in_kb) but it's not good solution for me. Any ideas?

velvia commented 8 years ago

Hi Alexander,

What is the specific error that you see? I’m about to merge in a very big PR, you might want to try the velvia/multiple-keys-refactor branch (be sure to re-read the README first as the data model has been enhanced). Among the changes are a new throttling mechanism on writes that should work much better, as well as the ability to configure the read and connect network timeouts, and an ability to change the number of segments batch written at one time.

-Evan

On Feb 15, 2016, at 8:49 AM, alexander-branevskiy notifications@github.com wrote:

Hi guys! Working with your project i faced off with problem that cassandra throw exception trying handle too large batch. How can i configure it (mb decrease batch size or sth else)? I havn't found any examples and spending a lot of time with your source has no affect. I solved this problem configuring cassandra config (param : batch_size_fail_threshold_in_kb) but it's not good solution for me. Any ideas?

— Reply to this email directly or view it on GitHub https://github.com/tuplejump/FiloDB/issues/60.

alexander-branevskiy commented 8 years ago

Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. The error appears when i invoke .saveAsFiloDataset. Here is full stacktrace:

_ERROR phantom: Batch too large ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0) filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19) at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185) at scala.util.Try$.apply(Try.scala:161) at scala.util.Failure.recover(Try.scala:185) at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.Responses$Error.asException(Responses.java:124) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186) at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) _

velvia commented 8 years ago

Right now, we write all the chunks in a single segment at once, so most likely your segment size is quite big…. would you know how much data is in a segment? (Run filo-cli —command analyze —dataset for me and dump the output)

I’ll add a config for the batch size and make it configurable.

On Feb 15, 2016, at 9:34 AM, alexander-branevskiy notifications@github.com wrote:

Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. It's stacktrace:

ERROR phantom: Batch too large ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0) filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19) at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185) at scala.util.Try$.apply(Try.scala:161) at scala.util.Failure.recover(Try.scala:185) at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.Responses$Error.asException(Responses.java:124) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186) at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)

— Reply to this email directly or view it on GitHub https://github.com/tuplejump/FiloDB/issues/60#issuecomment-184318838.

alexander-branevskiy commented 8 years ago

Hi here is your output:

numSegments: 1 numPartitions: 1 ===== # Rows in a segment ===== Min: 201 Max: 201 Average: 201.0 (1) | 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ===== # Chunks in a segment ===== Min: 21 Max: 21 Average: 21.0 (1) | 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ===== # Segments in a partition ===== Min: 1 Max: 1 Average: 1.0 (1) | 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).

velvia commented 8 years ago

Huh, that’s really interesting. What version of Cassandra are you running? You must have a really small batch size configured. I’m running 2.1.6 locally with default settings and have never run into this.

On Feb 15, 2016, at 10:18 AM, alexander-branevskiy notifications@github.com wrote:

Hi here is your output:

numSegments: 1 numPartitions: 1 ===== # Rows in a segment ===== Min: 201 Max: 201 Average: 201.0 (1) | 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ===== # Chunks in a segment ===== Min: 21 Max: 21 Average: 21.0 (1) | 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ===== # Segments in a partition ===== Min: 1 Max: 1 Average: 1.0 (1) | 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).

— Reply to this email directly or view it on GitHub https://github.com/tuplejump/FiloDB/issues/60#issuecomment-184333586.

alexander-branevskiy commented 8 years ago

I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.

velvia commented 8 years ago

Ah, ok. 50kb is really small for FiloDB, because we write big binary blobs, so would definitely need to be increased. From what I read the size is in KB.

On Feb 15, 2016, at 10:27 AM, alexander-branevskiy notifications@github.com wrote:

I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.

— Reply to this email directly or view it on GitHub https://github.com/tuplejump/FiloDB/issues/60#issuecomment-184336332.

alexander-branevskiy commented 8 years ago

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

velvia commented 8 years ago

Sure, if that would help. I’ll do that before releasing next version — or at least make it possible to build it easily yourself. Spark 1.x is still on Scala 2.10, so that means 2.10 has to still be an option for people.

What is your use case, out of curiosity?

On Feb 15, 2016, at 10:52 AM, alexander-branevskiy notifications@github.com wrote:

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

— Reply to this email directly or view it on GitHub https://github.com/tuplejump/FiloDB/issues/60#issuecomment-184344992.

velvia commented 8 years ago

Added an option columnstore.chunk-batch-size, to control the number of statements per unlogged batch. This is in a branch, will be merged to master soon, along with lots of improvements.

On Feb 15, 2016, at 10:52 AM, alexander-branevskiy notifications@github.com wrote:

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

— Reply to this email directly or view it on GitHub https://github.com/tuplejump/FiloDB/issues/60#issuecomment-184344992.

velvia commented 8 years ago

@alexander-branevskiy this should be resolved with PR #30 merged.