RediSearch / redisearch-py

RediSearch python client
https://redisearch.io
BSD 2-Clause "Simplified" License
221 stars 62 forks source link

Aggregations API #21

Open dvirsky opened 6 years ago

dvirsky commented 6 years ago

This is a proposed API for aggregations.

The aggregation pipeline is very suitable for a builder/fluent style API. It basically includes the following elements, repeating and transforming the pipeline:

  1. The base filter query (non repeating)
  2. Load - load properties from the document (if they are not in the sortables)
  3. Group by (with its reducers)
  4. Sort by
  5. Apply expression on values
  6. Limit

These are chained repeatably to transform the pipeline. So here's what I have in mind:

# query can be a string or a structured query object
req = AggregateRequest(query)
         .load('@foo', '@bar')
         .group_by(('@foo', '@bar'), 
                   #reducers 
                   count().as('total'),
                  count_distinct('@bar').as('num_bars'),
                  # alternative proposal
                  num_bars = count_distinct('@bar')
          )
          .apply("sqrt(@foo/@num_bars)", as='sqr')
          .sort_by(Sort.desc('@sqr'), Sort.asc('@other'), max_results = 100)
          .limit(0, 10)

resp = client.aggregate(req)  
filipecosta90 commented 4 years ago

@mnunberg filter expressions are still not supported on the python client correct? Example from documentation.

FT.AGGREGATE 
  ...
  FILTER "@name=='foo' && @age < 20"
  ...
stryt2 commented 4 years ago

Hello,

Under the APPLY {expr} AS {name} section in parameters in detail from the official Redisearch page, it is said that

APPLY ... can be referenced by further APPLY / SORTBY / GROUPBY / REDUCE operations down the pipeline.

However, looking at the current build_args method for the AggregateRequest, the GROUPBY keyword and its fields always come before the APPLY keyword and its fields. E.g.

import redisearch

aggregate_request = redisearch.aggregation.AggregateRequest()
# Call 1
aggregate_request.apply(foo="@bar / 2").group_by("@foo", redisearch.reducers.count())
# or Call 2
# aggregate_request.group_by("@foo", redisearch.reducers.count()).apply(foo="@bar / 2")

print(aggregate_request.build_args())

would have 2 calls (Call 1 and Call 2) both resulting as (irrespective of the order of methods),

['*', 'GROUPBY', '1', '@baz', 'REDUCE', 'COUNT', '0', 'APPLY', '@bar / 2', 'AS', 'foo']

 

However, shouldn't the expected behaviour of Call 1 being

['*', 'APPLY', '@bar / 2', 'AS', 'foo', 'GROUPBY', '1', '@baz', 'REDUCE', 'COUNT', '0']

i.e. the order of the keywords will be dependant upon the order of call?

 

The reason why this is an issue is that Call 2 would result an error (if field foo does not originally exist) saying

No such property foo

whereas Call 1 should not.

 

Any help to get around this issue is greatly appreciated. Thanks.

filipecosta90 commented 4 years ago

hi there @stryt2, we were discussing the same error internally as we're revising this client and extending to redisearch-go client ( trying to make it have the same look and feel like this one ) and found the exact same problem as you. We should have a PR to correct the above very soon.

stryt2 commented 4 years ago

Glad to hear that. Thanks very much.