query slow about near real time data

cxlRay commented 4 years ago

when send batch query to Realtime processes tasks, the performance is too bad. TPS only have 80, response time is more than one second, max response time can be 30 second and 99% response time can be 15 second. what I confused is the data of Realtime processes are in mem, why response time of query is so long?

Affected Version

v druid-0.16.1-incubating.

Description

Cluster size coordinator and overlord: 2 historical: 7 middleManager: 7 broker: 5
The testing tool is jmeter
testing result thread num: 500 TPS: 107.80 average response time: 4372ms 99% response time: 15559ms max response time: 30150ms min response time: 373
the Flame chart of Realtime task code when test
the configuration of middleManager

the configuration of Realtime tasks

{
"type": "kafka",
"dataSchema": {
"dataSource": "xxxx",
"parser": {
  "type": "string",
  "parseSpec": {
    "format": "json",
    "timestampSpec": {
      "column": "timestamp",
      "format": "posix"
    },
    "dimensionsSpec": {
      "dimensions": ["tag1","tag2","tag3","tag4","tag5","tag6","tag7"],
      "dimensionExclusions": [
        "timestamp",
        "value"
      ]
    }
  }
},
"metricsSpec": [
  {
    "name": "value",
    "fieldName": "value",
    "type": "doubleSum"
  }
],
"granularitySpec": {
  "type": "uniform",
  "segmentGranularity": "HOUR",
  "queryGranularity": "NONE",
  "rollup" : false
}
},
"tuningConfig": {
"type": "kafka",
"intermediatePersistPeriod": "PT1H",
"maxTotalRows": "245000000",
"maxRowsPerSegment": 5000000
},
"ioConfig": {
"topic": "xxxxx",
"consumerProperties": {
  "bootstrap.servers": "xxxx:9092"
},
"taskCount": 16,
"replicas": 1,
"taskDuration": "PT1H"
}
}

about segments

my query

{
"queryType": "timeseries",
"dataSource": "xxxx",
"granularity": "second",
"context": {
"skipEmptyBuckets": true,
"vectorize": "true"
},
"filter": { "type": "and", "fields": [{ "type": "selector", "dimension": "endpoint", "value": "host"}, { "type": "selector", "dimension": "metric", "value":"cpu.busy"}] },
"aggregations": [
{ "type": "count", "name": "count"},
{ "type": "stringLast", "name": "dsType", "fieldName": "counterType" },
{ "type": "doubleMax", "name": "max_value", "fieldName": "value" },
{ "type": "doubleMin", "name": "min_value", "fieldName": "value" },
{ "type": "doubleSum", "name": "sum_value", "fieldName": "value" }
],
"postAggregations": [
{ "type": "arithmetic",
  "name": "avg",
  "fn": "/",
  "fields": [
    { "type": "fieldAccess", "name": "sum_value", "fieldName": "sum_value" },
    { "type": "fieldAccess", "name": "count", "fieldName": "count" }
  ]
}
],
"intervals": [ "2020-05-25T15:35:00+08:00/2020-05-25T15:52:00+08:00" ]
}

yuanlihan commented 4 years ago

Hi @cxlRay, You can try to minimise intermediatePersistPeriod which is PT10M by default and then the persisted data could apply the "vectorize": "true" option. Hope this helps to some extent.

exherb commented 4 years ago

some issue here.

cxlRay commented 4 years ago

@yuanlihan thanks for your answer, minimise intermediatePersistPeriod will create much more small file

yuanlihan commented 4 years ago

@yuanlihan thanks for your answer, minimise intermediatePersistPeriod will create much more small file

@cxlRay that's true. But the temporary persist files will be cleaned up when the hourly tasks finished. And the persisted incremental files/indexes(with extra indexes to speed up query processing) will be more efficient than the in-memory incrementalIndex. As far as I know, when a query scan the latest in-memory incrementalIndex(like latest 10 minutes's data), Druid processes the in-memory fact table held by incrementalIndex row by row. Actually, I had tried to minimise the value of maxRowsInMemory to reduce in-memory rows but this will also introduce some overheads caused by frequently persisting.

exherb commented 4 years ago

timeseries is very slow with filter. (10-20s)

cxlRay commented 4 years ago

@exherb say more about your question, if your cluster run on ssd, there will be better

exherb commented 4 years ago

@exherb say more about your question, if your cluster run on ssd, there will be better

historacal & middlemanager runs on ssd.
5million rows per segement / kafka ingestion task
use turnilo to query
historical query/time is acceptable (around 100-500ms)
peon query：timeseries(lookup) / topN query/time is acceptable
peon query：topN(lookup) / timeseries is very slow (timeseries query/time: 16s-40s)

cxlRay commented 4 years ago

just like yuanlihan say，minimise intermediatePersistPeriod or maxRowsInMemory , add query parmeter vectorize

exherb commented 4 years ago

just like yuanlihan say，minimise intermediatePersistPeriod or maxRowsInMemory , add query parmeter vectorize

intermediatePersistPeriod: P10M maxRowsInMemory: 1000000

context: { vectorize: true }

still slow.

navis commented 4 years ago

rows in no-rollup incremental index are stored as

ConcurrentSkipListMap<Long, ConcurrentLinkedDeque<IncrementalIndexRow>>

which seemed not easy to be fast, imho.

Anyway, can you try with coarser granularity, like 15 minutes or something?

exherb commented 4 years ago

rows in no-rollup incremental index are stored as
ConcurrentSkipListMap<Long, ConcurrentLinkedDeque<IncrementalIndexRow>>
which seemed not easy to be fast, imho.

Anyway, can you try with coarser granularity, like 15 minutes or something?

we enabled rollup to 1minutes, segment granularity by hour. Are you suggestion change segment granularity to 15 minutes?

cxlRay commented 4 years ago

@exherb how do you know the slow part is peon query,the query process: client -> broker -> indexing service(Peon), use monitor metric or analysis code?

exherb commented 4 years ago

@exherb how do you know the slow part is peon query,the query process: client -> broker -> indexing service(Peon), use monitor metric or analysis code?

by druid metrics: query/segment/time / query/time (datasource~=.+middlemanager.+)

navis commented 4 years ago

@exherb I mean granularity of timeseries query.

louisliu318 commented 4 years ago

same issue.

apache / druid

query slow about near real time data #9918

Affected Version

Description