cedadev / search-futures

Future Search Architecture
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

bbox search - How to get bbox ingested from esgf solr #154

Open Mahir-Sparkess opened 2 years ago

Mahir-Sparkess commented 2 years ago

Current bbox search throws this error: pystac_client.exceptions.APIError: {"detail":"RequestError(400, 'search_phase_execution_exception', 'failed to find type for field [spatial.bbox]')"}

bbox is now detected as a geo_shape and is in the field spatial.bbox but the mapping for it is as follows:

"spatial" : {
  "properties" : {
    "bbox" : {
      "properties" : {
        "coordinates" : {
          "type" : "float"
        },
        "type" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
 }}}}}}},

The mapping needs to be updated to as such stated here:

PUT esgf-solr-items-2022-03-10/_mapping
{
  "mappings": {
    "properties": {
      "spatial": {
        "properties": {
          "bbox": {
            "type": "geo_shape"
          }
        }
      }
    }
  }
}

Since elasticsearch doesn't support updating the mapping of an existing field, it will require a reindex.

TO-DO:

Mahir-Sparkess commented 2 years ago

The ESGF lon and lat coordinates range from -90 to 90 for latitude and 0 to 360 for longitude. This breaks elasticsearch geoshape as that requires longitude range of -180 to 180

Need to update the assets/items/collections so longitude values are shifted by 180 for ["properties.min_lon", "properties.max_long", "spatial.bbox"]

The min and max lon properties are wrong, they need to be switched. currently min_lon represents east degrees and max_lon represents west degrees.

Mahir-Sparkess commented 2 years ago

There are too many nuances using the cardinal degrees to represent latitude and longitude. It might be worth reindexing with a different method of ingesting the bbox:

Solution

The ESGF Solr datasets have a geo/geos field with spatial information that uses correct ranges, however it can be a string or a list of strings in the format: ENVELOPE (min_lon, max_lon, max_lat, min_lat). Will need to look up how to convert a geos field to an appropriate bbox. The geos field can have one or many geo_shape string that have different values but seem to describe the same area. It is not standardised accross all the datasets.

  1. In the STAC generator, create or alter the esgf solr plugin/media_handler to ingest the geos field from the dataset level.
  2. look up how to transform or extract latitude and longitude data from the geos. Use the geos to add min_lat, max_lat, min_lon and max_lon to the asset properties.
  3. Some datasets don't have a geos so if there is no geos field, use the cardinal degrees to add min_lat, max_lat, min_lon and max_lon to the asset properties? (the current method)
  4. Update the elasticsearch aggregator spatial bbox method to format the bbox as expected by elasticsearch: [minLon, maxLon, maxLat, minLat] (This change might not be necessary, it seems elasticsearch can read what the current order is anyways.)
  5. Re-index the solr assets, items and collections.

When redoing the index, ensure the mapping for the new index for spatial.properties.bbox.properties.type = "geo_shape", it should be the case in the breezy deploy anyways: https://breezy.badc.rl.ac.uk/mrahman/stac-esgf-indexer-deploy/-/blob/master/images/item-generator/files/item-mapping.yml