apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.32k stars 3.66k forks source link

BUG: mistake in thetaSketch's doc #2520

Closed hamlet-lee closed 8 years ago

hamlet-lee commented 8 years ago

in doc, it says

Sketch Estimator

{
  "type"  : "thetaSketchEstimate",
  "name": <output name>,
  "fieldName"  : <the name field value of the thetaSketch aggregator>
}

but when I issue query like below, it does not work

{
    "queryType": "timeseries",
    "dataSource": "myds",
    "granularity": "all",
    "aggregations": [{
        "type": "filtered",
        "filter": {
            "type": "selector",
            "dimension": "_LogAction",
            "value": "AddNote"
        },
        "aggregator": {
            "type": "thetaSketch",
            "name": "set_AddNote",
            "fieldName": "uid_sketch_x256",
            "size": 4194304
        }
    }],
    "postAggregations": [{
        "type": "thetaSketchEstimate",
        "name": "estimate_set_AddFile",
        "fieldName": "set_AddNote"
    }],

    "intervals": [
        "2016-02-18T00:00:00+08:00/2016-02-19T00:00:00+08:00"
    ]
}

it reports:

Instantiation of [simple type, class io.druid.query.aggregation.datasketches.theta.SketchEstimatePostAggregator] value failed: field is null (through reference chain: java.util.ArrayList[0])

and when I issue below query, it works

{
    "queryType": "timeseries",
    "dataSource": "myds",
    "granularity": "all",
    "aggregations": [{
        "type": "filtered",
        "filter": {
            "type": "selector",
            "dimension": "_LogAction",
            "value": "AddNote"
        },
        "aggregator": {
            "type": "thetaSketch",
            "name": "set_AddNote",
            "fieldName": "uid_sketch_x256",
            "size": 4194304
        }
    }],
    "postAggregations": [{
        "type": "thetaSketchEstimate",
        "name": "estimate_set_AddFile",
        "field": {
            "type": "fieldAccess",
            "fieldName": "set_AddNote"
        }
    }],

    "intervals": [
        "2016-02-18T00:00:00+08:00/2016-02-19T00:00:00+08:00"
    ]
}

so, the syntax should be below?

{
  "type"  : "thetaSketchEstimate",
  "name": <output name>,
  "field"  : <thetaSketch aggregate result>
}

"thetaSketch aggregate result" can be

{
 "type": "fieldAccess",
 "fieldName": ...
}

or

{
 "type": "thetaSketchSetOp
 ...
}
b-slim commented 8 years ago

this fix for post aggregator https://github.com/druid-io/druid/pull/2526