apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.32k stars 3.66k forks source link

BUG: thetaSketch do not work for larger size in complex query such as groupBy #2224

Closed hamlet-lee closed 8 years ago

hamlet-lee commented 8 years ago

I try to increase size for thetaSketch but failed.
I provide a test case at https://github.com/hamlet-lee/druid/commit/f1dc76c43b4a54ef2b97e3771b3a752a2fb24cb6 . Simply multiply default size by 2.

hamlet-lee commented 8 years ago

error message

Running io.druid.query.aggregation.datasketches.theta.SketchAggregationTest
Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.555 sec <<< FAILURE! - in io.druid.query.aggregation.datasketches.theta.SketchAggregationTest
testSketchDataIngestAndQuery2(io.druid.query.aggregation.datasketches.theta.SketchAggregationTest)  Time elapsed: 4.123 sec  <<< ERROR!
com.metamx.common.ISE: Not enough memory to process even a single item.  Required [1,048,640] memory, but only have[1,048,576]
    at io.druid.query.groupby.GroupByQueryEngine$RowIterator.next(GroupByQueryEngine.java:378)
    at io.druid.query.groupby.GroupByQueryEngine$RowIterator.next(GroupByQueryEngine.java:286)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81)
    at com.metamx.common.guava.ConcatSequence.makeYielder(ConcatSequence.java:93)
    at com.metamx.common.guava.ConcatSequence.toYielder(ConcatSequence.java:72)
    at io.druid.query.aggregation.AggregationTestHelper$6.run(AggregationTestHelper.java:374)
    at io.druid.query.ConcatQueryRunner$1.apply(ConcatQueryRunner.java:51)
    at io.druid.query.ConcatQueryRunner$1.apply(ConcatQueryRunner.java:47)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:67)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.makeIncrementalIndex(GroupByQueryQueryToolChest.java:288)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResults(GroupByQueryQueryToolChest.java:232)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.access$000(GroupByQueryQueryToolChest.java:87)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$3.run(GroupByQueryQueryToolChest.java:136)
    at io.druid.query.FinalizeResultsQueryRunner.run(FinalizeResultsQueryRunner.java:102)
    at io.druid.query.aggregation.AggregationTestHelper.runQueryOnSegmentsObjs(AggregationTestHelper.java:362)
    at io.druid.query.aggregation.AggregationTestHelper.runQueryOnSegments(AggregationTestHelper.java:317)
    at io.druid.query.aggregation.AggregationTestHelper.runQueryOnSegments(AggregationTestHelper.java:294)
    at io.druid.query.aggregation.AggregationTestHelper.createIndexAndRunQueryOnSegment(AggregationTestHelper.java:153)
    at io.druid.query.aggregation.datasketches.theta.SketchAggregationTest.testSketchDataIngestAndQuery2(SketchAggregationTest.java:128)
himanshug commented 8 years ago

unit test is not good place to test for scalability, it fails because for this unit test druid.processing.buffer.sizeBytes is set to a very small value and becomes unsufficient when you double sketch size.

if this is failing in a real cluster, then pls increase value of druid.processing.buffer.sizeBytes

himanshug commented 8 years ago

you can make your test pass by changing https://github.com/druid-io/druid/blob/master/processing/src/test/java/io/druid/query/aggregation/AggregationTestHelper.java#L114 to 5*1024*1024

hamlet-lee commented 8 years ago

when query on a real cluster with a large size (default multiple with 256), I got below exception

(in historical log)

com.fasterxml.jackson.databind.JsonMappingException: Input sketch too large for allocated memory. (through reference chain: io.druid.query.Result["result"]->io.druid.query.BySegmentResultValueClass["results"]->com.google.common.collect.TransformingRandomAccessList[0])
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:189) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:213) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:105) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:21) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.std.AsArraySerializerBase.serialize(AsArraySerializerBase.java:183) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:280) ~[jackson-core-2.4.6.jar:2.4.6]
    at io.druid.jackson.DruidDefaultSerializersModule$5.serialize(DruidDefaultSerializersModule.java:137) ~[druid-processing-0.8.3-rc2.jar:0.8.3-rc2]
    at io.druid.jackson.DruidDefaultSerializersModule$5.serialize(DruidDefaultSerializersModule.java:128) ~[druid-processing-0.8.3-rc2.jar:0.8.3-rc2]
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:800) ~[jackson-databind-2.4.6.jar:2.4.6]
    at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:642) ~[jackson-databind-2.4.6.jar:2.4.6]
    at io.druid.server.QueryResource$2.write(QueryResource.java:185) ~[druid-server-0.8.3-rc2.jar:0.8.3-rc2]
    at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71) ~[jersey-core-1.19.jar:1.19]
    at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57) ~[jersey-core-1.19.jar:1.19]
    at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:302) ~[jersey-server-1.19.jar:1.19]
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1510) ~[jersey-server-1.19.jar:1.19]
    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) ~[jersey-server-1.19.jar:1.19]
    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) ~[jersey-server-1.19.jar:1.19]
    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) ~[jersey-servlet-1.19.jar:1.19]
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) ~[jersey-servlet-1.19.jar:1.19]
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) ~[jersey-servlet-1.19.jar:1.19]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]
    at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:132) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:129) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:206) ~[guice-servlet-4.0-beta.jar:?]
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:129) ~[guice-servlet-4.0-beta.jar:?]
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) ~[jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) ~[jetty-servlets-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) ~[jetty-servlets-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) ~[jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.Server.handle(Server.java:497) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [jetty-io-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:620) [jetty-util-9.2.5.v20141112.jar:9.2.5.v20141112]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:540) [jetty-util-9.2.5.v20141112.jar:9.2.5.v20141112]
    at java.lang.Thread.run(Thread.java:745) [?:1.7.0_79]
Caused by: java.lang.IllegalArgumentException: Input sketch too large for allocated memory.
    at com.yahoo.sketches.theta.SetOperation.computeMinLgArrLongsFromCount(SetOperation.java:157) ~[?:?]
    at com.yahoo.sketches.theta.HeapIntersection.update(HeapIntersection.java:141) ~[?:?]
    at io.druid.query.aggregation.datasketches.theta.SketchOperations.sketchSetOperation(SketchOperations.java:100) ~[?:?]
    at io.druid.query.aggregation.datasketches.theta.SketchSetPostAggregator.compute(SketchSetPostAggregator.java:89) ~[?:?]
    at io.druid.query.aggregation.datasketches.theta.SketchEstimatePostAggregator.compute(SketchEstimatePostAggregator.java:66) ~[?:?]
    at io.druid.query.timeseries.TimeseriesQueryQueryToolChest$5.apply(TimeseriesQueryQueryToolChest.java:271) ~[druid-processing-0.8.3-rc2.jar:0.8.3-rc2]
    at io.druid.query.timeseries.TimeseriesQueryQueryToolChest$5.apply(TimeseriesQueryQueryToolChest.java:259) ~[druid-processing-0.8.3-rc2.jar:0.8.3-rc2]
    at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:573) ~[guava-16.0.1.jar:?]
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:84) ~[jackson-databind-2.4.6.jar:2.4.6]
    ... 56 more

setting for historical

druid.processing.buffer.sizeBytes=1000000000

Is this error related to this setting?

himanshug commented 8 years ago

are you getting the error at query time or ingestion time? if it failed at query time, did you specify "size" attribute same as that at ingestion time in thetaSketch aggregators and post aggregators?

himanshug commented 8 years ago

also try keeping druid.processing.buffer.sizeBytes a power of 2 for potentially better performance due to alignments.

hamlet-lee commented 8 years ago

ingestion size= query size = 4194304 this error happens in query time. the exception is from "historical" log

himanshug commented 8 years ago

can you paste your indexing json and query json file?

himanshug commented 8 years ago

my guess is that, at query time, you are not specifying size=4194304 in your sketch post aggregators..

hamlet-lee commented 8 years ago

ingest

{
  "type" : "index_hadoop",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "demo",
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "ts",
            "format" : "millis"
          },
          "dimensionsSpec" : {
            "dimensions": [],
                           "dimensionExclusions" : [],
                        "spatialDimensions" : []
          }
        }
      },
      "metricsSpec" : [
        {
          "type" : "count",
          "name" : "cnt"
        },
    {
      "type" : "hyperUnique",
      "name" : "_commonId_sessionId_hyper_unique",
      "fieldName": "_commonId_sessionId_Dup"
    },
    {
      "type" : "hyperUnique",
      "name" : "_commonId_hyper_unique",
      "fieldName": "_commonId_Dup"
    },
    {
      "type": "thetaSketch",
      "name": "_commonId_sketch",
      "fieldName": "_commonId_Dup"
    },
    {
      "type": "thetaSketch",
      "name": "_commonId_sketch_x256",
      "fieldName": "_commonId_Dup",
      "size": 4194304
    },
    {
      "type" : "longSum",
      "name" : "_commonId_isSessionStart_cnt",
      "fieldName": "_commonId_isSessionStart_Dup"
    },
    {
      "type" : "longSum",
      "name" : "_commonId_sessionActionCountOne",
      "fieldName": "_commonId_sessionActionCountOne"
    },
    {
      "type" : "longSum",
      "name" : "_commonId_sessionStaySecsCnt",
      "fieldName": "_commonId_sessionStaySecs"
    },
    {
      "type" : "longSum",
      "name" : "_commonId_sessionActionCount",
      "fieldName": "_commonId_sessionActionCount"
    },
    {
      "type" : "longSum",
      "name" : "_commonId_isDayUidStart_cnt",
      "fieldName": "_commonId_isDayUidStart_Dup"
    },
    {
      "type" : "approxHistogramFold",
      "name" : "_commonId_sessionStaySecsHistFold",
      "fieldName": "_commonId_sessionStaySecs",
      "resolution": 100,
      "numBuckets": 10
    }
    ,
    {
      "type" : "hyperUnique",
      "name" : "userid_sessionId_hyper_unique",
      "fieldName": "userid_sessionId_Dup"
    },
    {
      "type" : "hyperUnique",
      "name" : "userid_hyper_unique",
      "fieldName": "userid_Dup"
    },
    {
      "type": "thetaSketch",
      "name": "userid_sketch",
      "fieldName": "userid_Dup"
    },
    {
      "type": "thetaSketch",
      "name": "userid_sketch_x256",
      "fieldName": "userid_Dup",
      "size": 4194304
    },
    {
      "type" : "longSum",
      "name" : "userid_isSessionStart_cnt",
      "fieldName": "userid_isSessionStart_Dup"
    },
    {
      "type" : "longSum",
      "name" : "userid_sessionActionCountOne",
      "fieldName": "userid_sessionActionCountOne"
    },
    {
      "type" : "longSum",
      "name" : "userid_sessionStaySecsCnt",
      "fieldName": "userid_sessionStaySecs"
    },
    {
      "type" : "longSum",
      "name" : "userid_sessionActionCount",
      "fieldName": "userid_sessionActionCount"
    },
    {
      "type" : "longSum",
      "name" : "userid_isDayUidStart_cnt",
      "fieldName": "userid_isDayUidStart_Dup"
    },
    {
      "type" : "approxHistogramFold",
      "name" : "userid_sessionStaySecsHistFold",
      "fieldName": "userid_sessionStaySecs",
      "resolution": 100,
      "numBuckets": 10
    }
    ,
    {
      "type" : "hyperUnique",
      "name" : "username_sessionId_hyper_unique",
      "fieldName": "username_sessionId_Dup"
    },
    {
      "type" : "hyperUnique",
      "name" : "username_hyper_unique",
      "fieldName": "username_Dup"
    },
    {
      "type": "thetaSketch",
      "name": "username_sketch",
      "fieldName": "username_Dup"
    },
    {
      "type": "thetaSketch",
      "name": "username_sketch_x256",
      "fieldName": "username_Dup",
      "size": 4194304
    },
    {
      "type" : "longSum",
      "name" : "username_isSessionStart_cnt",
      "fieldName": "username_isSessionStart_Dup"
    },
    {
      "type" : "longSum",
      "name" : "username_sessionActionCountOne",
      "fieldName": "username_sessionActionCountOne"
    },
    {
      "type" : "longSum",
      "name" : "username_sessionStaySecsCnt",
      "fieldName": "username_sessionStaySecs"
    },
    {
      "type" : "longSum",
      "name" : "username_sessionActionCount",
      "fieldName": "username_sessionActionCount"
    },
    {
      "type" : "longSum",
      "name" : "username_isDayUidStart_cnt",
      "fieldName": "username_isDayUidStart_Dup"
    },
    {
      "type" : "approxHistogramFold",
      "name" : "username_sessionStaySecsHistFold",
      "fieldName": "username_sessionStaySecs",
      "resolution": 100,
      "numBuckets": 10
    }
  ],
     "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "DAY",
        "queryGranularity" : "NONE",
        "intervals" : [ "2015-12-28T00:00:00.000+08:00/2015-12-29T00:00:00.000+08:00" ]
      }
    },
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "hdfs://myhost:9002/exp/demo-kv-json/20151228"
      }
    },
    "tuningConfig" : {
      "type": "hadoop",
      "ignoreInvalidRows": true,
      "combineText": true,
      "persistInHeap": false,
      "indexSpec" : {
        "bitmap" : {
          "type" : "concise"
        }
      },
      "rowFlushBoundary": 40000,
      "partitionsSpec": {
        "type": "hashed",
        "numShards": 4
      },
      "jobProperties":{
        "mapreduce.input.fileinputformat.split.minsize": "10240000",
        "mapreduce.input.fileinputformat.split.maxsize": "10240000",
         "mapreduce.map.memory.mb": "4000",
        "mapreduce.task.io.sort.mb":"2047",
        "mapreduce.task.io.sort.factor":"100",
        "mapreduce.map.output.compress":"true",
        "mapreduce.map.java.opts": "-XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:-OmitStackTraceInFastThrow -server -Xmn400m -Xms3200m -Xmx3200m -Duser.timezone=Asia/Shanghai -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:PermSize=256m -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit",
        "mapreduce.reduce.memory.mb": "8000",
        "mapreduce.reduce.java.opts": "-XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:-OmitStackTraceInFastThrow -server -Xmn1000m -XX:MaxPermSize=256m -Xmx5000m -Xms5000m -Duser.timezone=Asia/Shanghai -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:PermSize=256m -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit"
      }
    }
  },
  "context": {
    "druid.indexer.runner.javaOpts": "-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxPermSize=256m -XX:PermSize=256m -Xms1024m -Xmx1024m"
  }
}

query

{
    "queryType" : "timeseries",
    "dataSource" : "demo",
    "granularity" : "all",
    "filter" : {
        "type" : "and",
        "fields" : [{
                "type" : "selector",
                "dimension" : "visitKeyDesc",
                "value" : "网站页面"
            }, {
                "type" : "not",
                "field" : {
                    "type" : "selector",
                    "dimension" : "userid",
                    "value" : null
                }
            }
        ]
    },
    "aggregations" : [{
            "type" : "filtered",
            "filter" : {
                "type" : "selector",
                "dimension" : "intDay",
                "value" : 16798
            },
            "aggregator" : {
                "type" : "thetaSketch",
                "name" : "zeroDayUniqueUser",
                "size" : 4194304,
                "fieldName" : "userid_sketch_x256"
            }
        }, {
            "type" : "filtered",
            "filter" : {
                "type" : "selector",
                "dimension" : "intDay",
                "value" : 16799
            },
            "aggregator" : {
                "type" : "thetaSketch",
                "name" : "onDayUniqueUser",
                "size" : 4194304,
                "fieldName" : "userid_sketch_x256"
            }
        }
    ],
    "postAggregations" : [{
            "type" : "thetaSketchEstimate",
            "name" : "intersect_unique_users",
            "field" : {
                "type" : "thetaSketchSetOp",
                "name" : "intersect_unique_users_sketch",
                "func" : "INTERSECT",
                "fields" : [{
                        "type" : "fieldAccess",
                        "fieldName" : "zeroDayUniqueUser"
                    }, {
                        "type" : "fieldAccess",
                        "fieldName" : "onDayUniqueUser"
                    }
                ]
            }
        }
    ],
    "intervals" : ["2015-12-29T00:00:00+08:00/2015-12-30T00:00:00+08:00", "2015-12-30T00:00:00+08:00/2015-12-31T00:00:00+08:00"]
}
himanshug commented 8 years ago

you would need to specify "size=4194304" attribute for thetaSketchSetOp post aggregator as well.

I missed it in the documentation, will update that.

hamlet-lee commented 8 years ago

yeah, it works now! 3ks~

hamlet-lee commented 8 years ago

Maybe we should also improve error message?

himanshug commented 8 years ago

will see to error message but that is kind of harder in this case.

himanshug commented 8 years ago

fixed by #2232