cedardevs / onestop

OneStop is a data discovery system being built by CIRES researchers on a grant from the NOAA National Centers for Environmental Information. We welcome contributions from the community!
GNU General Public License v2.0
43 stars 21 forks source link

GCMD Keyword Verification at indexing instead of search #456

Open ajakz opened 6 years ago

ajakz commented 6 years ago

Currently, a search request is sent, aggregations are built, and only ones that pass the "top level keywords" check for GCMD science/locations are sent in the response from the API.

In order to keep our data cleaner AND reduce pointless aggregations when searching, we should move this keyword verification from the ElasticsearchService (search response construction) to the ETL service when data is mapped from Staging to Search.

Likewise, there is another hierarchy keyword type that should be checked here (the field is 'gcmdScienceServices'). As of 1/31/18, the top level keywords map should be:

  private static final topLevelKeywords = [
      'science' : [
          'Agriculture', 'Atmosphere', 'Biological Classification', 'Biosphere', 'Climate Indicators',
          'Cryosphere', 'Human Dimensions', 'Land Surface', 'Oceans', 'Paleoclimate', 'Solid Earth',
          'Spectral/Engineering', 'Sun-Earth Interactions', 'Terrestrial Hydrosphere'
      ],
      'location': [
          'Continent', 'Geographic Region', 'Ocean', 'Solid Earth', 'Space', 'Vertical Location'
      ],
      'service': [
          'Data Analysis And Visualization', 'Data Management/Data Handling', 'Education/Outreach', 'Environmental Advisories',
          'Hazards Management', 'Metadata Handling', 'Models', 'Reference And Information Services', 'Web Services'
      ]
  ]
mcquinne commented 3 years ago

TODO: Confirm that this is being handled by the the indexer application