Closed flyingandrunning closed 8 years ago
You can eliminate duplicates using aggregations
With terms aggregation the results will be grouped by one field
try
SELECT userId FROM logstash-api-access-2016.08.17 where ip = '112.65.191.239' group by userId
@shi-yuan
{
"from": 0,
"size": 200,
"_source": {
"includes": [
"name"
],
"excludes": []
}
}
Distict don't has physical execution plan
hi @wanglifengwf i am not sure it support distinct
i am sure it support count(distinct(userId)) ;
you can try it
SELECT userId FROM logstash-api-access-2016.08.17 where ip = '112.65.191.239' group by userId
Yeah distinct is not supported (there is no such feature in elasticsearch) Count distinct is supported(cardinality feature on elasticsearch) I'll make an error when using only distinct on the next version
@eliranmoyal @ansjsun I'm sorry,I forgot es does not support distinct.thanks!
es support Finding Distinct Counts
Find distinct values, use Terms Aggregation. And you need to pay attention to how that field you want to get distinct values on is analyzed, meaning you need to make sure you're not tokenizing it while indexing, otherwise every entry in the aggregation will be a different term that is part of the field content.
@eliranmoyal @ansjsun group by and distinct sometimes may have the same effect . If distinct used and group by block not found in sql ,we can just automatically add it for user .
SELECT distinct(userId) FROM logstash-api-access-2016.08.17 where ip = '112.65.191.239'
will be translate to :
SELECT userId FROM logstash-api-access-2016.08.17 where ip = '112.65.191.239' group by userId
@allwefantasy
or this
select topHits('size'=1) FROM logstash-api-access-2016.08.17 where ip = '112.65.191.239' group by userId
SELECT distinct(userId) FROM logstash-api-access-2016.08.17 where ip = '112.65.191.239'
some data repeat,
出现部分的数据重复