rhysrevans3 commented 2 years ago

How should the Elasticsearch aggregator behave?

Currently

Aggregates properties into lists.
Hard coded for spatial and temporal to use their own aggregators.
File information isn't aggregated, such as size, extension, filename, etc.

Alternate solutions

Hard coded (current):

Data scientists have to use the correct terms. (min_lon, min_lat, etc.)
Code is less configurable. Will have to be updated if term name changed or new term added.

    - name: elasticsearch_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - table_id
          - source_id
          - experiment_id
          - permitted_use
          - license
          - bbox
          - size

Specified in config:

More complex config.
Could stop the ability to use exclude.

    - name: elasticsearch_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        spatial:
          - bbox
        list:
          - table_id
          - source_id
          - experiment_id
          - permitted_use
          - license
        sum:
          - size

Different extractors for different types:

Multiple Elasticsearch calls.

    - name: elasticsearch_term_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - table_id
          - source_id
          - experiment_id
          - permitted_use
          - license
    - name: elasticsearch_spatial_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - bbox
    - name: elasticsearch_number_aggregator
      inputs:
        url: elasticsearch.com
        index: item
        terms:
          - size

rhysrevans3 commented 2 years ago

@agstephens @Mahir-Sparkess do either of you have any strong opinions on this?

Mahir-Sparkess commented 2 years ago

That looks really good. Since the spatial/temporal aggregation is using a separate call anyways splitting the elasticsearch aggregator into multiple extraction methods won't increase the number of calls made. As well as number aggregator needs to be different when calling as well compared to the term so all that makes sense. 👍

agstephens commented 2 years ago

@rhysrevans3: I don't have a strong opinion - but the middle option is the quickest/easiest to understand if you were reading or writing the rules.

cedadev / search-futures

How to aggregate non string properties #146