cedadev / search-futures

Future Search Architecture
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

How to aggregate non string properties #146

Closed rhysrevans3 closed 2 years ago

rhysrevans3 commented 2 years ago

How should the Elasticsearch aggregator behave?

Currently

Alternate solutions

Hard coded (current):

    - name: elasticsearch_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - table_id
          - source_id
          - experiment_id
          - permitted_use
          - license
          - bbox
          - size

Specified in config:

    - name: elasticsearch_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        spatial:
          - bbox
        list:
          - table_id
          - source_id
          - experiment_id
          - permitted_use
          - license
        sum:
          - size

Different extractors for different types:

    - name: elasticsearch_term_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - table_id
          - source_id
          - experiment_id
          - permitted_use
          - license
    - name: elasticsearch_spatial_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - bbox
    - name: elasticsearch_number_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - size
rhysrevans3 commented 2 years ago

@agstephens @Mahir-Sparkess do either of you have any strong opinions on this?

Mahir-Sparkess commented 2 years ago

That looks really good. Since the spatial/temporal aggregation is using a separate call anyways splitting the elasticsearch aggregator into multiple extraction methods won't increase the number of calls made. As well as number aggregator needs to be different when calling as well compared to the term so all that makes sense. 👍

agstephens commented 2 years ago

@rhysrevans3: I don't have a strong opinion - but the middle option is the quickest/easiest to understand if you were reading or writing the rules.