developmentseed / eoAPI

[Active Development] Earth Observation API (Metadata, Raster and Vector services)
https://eoapi.dev
MIT License
205 stars 21 forks source link

mosaic table in PG database #22

Closed vincentsarago closed 2 years ago

vincentsarago commented 2 years ago

Enable retrieving mosaic by name instead of mosaicid

name mosaicid minzoom maxzoom bounds default assets available assets
"vincent.mosaic" "ada12e921qduashdas" 0 24 [-180, -90, 180, 90] ["cog"] ["cog", "thumbnail", "raw"]
sharkinsspatial commented 2 years ago

@vincentsarago Can we also consider having the root mosaic endpoint return a list of the mosaics and their name and mosaicid? This might assist in discoverability for some of the VEDA / Dashboard evolution work.

vincentsarago commented 2 years ago

@sharkinsspatial

Can we also consider having the root mosaic endpoint return a list of the mosaics and their name and mosaicid?

Sure, but only if we go ahead with a new mosaic table in the pgstac database. It might be fine for EOapi but I'm a bit worry. The goal of eoAPI submodules is to connect to any pgstac db. if we introduce a mosaic table this might close some possibility.

Maybe https://github.com/stac-utils/titiler-pgstac/issues/30 is a better possibility. We could require a specific metadata to be present (e.g type: Mosaic) and use it as a filter value

cc @bitner

sharkinsspatial commented 2 years ago

@bitner As we're expanding the use of pgstac-titiler at NASA we have a few use cases where client applications will need to request information about all available mosaics in order to dynamically configure a list of available tile endpoints and their characteristics. With https://github.com/stac-utils/titiler-pgstac/pull/38 and several follow on PRs @vincentsarago is serializing the majority of the information we need. Is it feasible from a performance perspective to include a root mosaics endpoint which would fetch, deserialize and return all mosaic hashes as is available with the current individual info endpoint https://github.com/stac-utils/titiler-pgstac/blob/0f2b5b4ba50bb3458237ab21cf9a154d7b811851/titiler/pgstac/factory.py#L359-L367? cc @anayeaye @abarciauskas-bgse

vincentsarago commented 2 years ago

I've made an addition PR in https://github.com/stac-utils/titiler-pgstac/pull/45

@sharkinsspatial let me know what you think!

Note, if we don't move forward with it in titiler-pgstac I'll totally add this in eoAPI anyway.

bitner commented 2 years ago

@sharkinsspatial EEEEK, I realllllly don't think you want to do that!

That endpoint lists every single search that has ever been made against the pgstac instance! If someone changes a date range, it's another record, etc.

For reference - Planetary Computer has over 4 million different records in the searches table!

I think it could be useful for something like seeing what people are searching on to debug things, but with no control over the searches that are getting recorded I don't see any possible world where it could be useful or scale to any reasonable amount as a "mosaic catalog". I'm not talking about performance here - it could perform just fine, it's more along the lines of I can't see how would you make any sense of it?

These mosaics are by design dynamic - a listing of "every dynamic thing that people can come up with" just doesn't seem right. It may be that I'm just missing something here, but I really don't see how this could be useful??? At least for the Planetary Computer, we are already seeing things in the logs where someone is setting up cron jobs that change the date range every so often and use that to grab new data -- someone could do this against a stac instance say every minute with each and every query being different, so being another record.

vincentsarago commented 2 years ago

@bitner I totally get your point, but mosaic are a little less dynamic and will often be more hard coded search (e.g for static dataset like naip)

In https://github.com/stac-utils/titiler-pgstac/pull/45 what I'm proposing is that we filter only search that have a specific metadata metadata.type = "mosaic" which should narrow things down.

vincentsarago commented 2 years ago

or maybe we could use stac directly 🤷 which means that we could create a mosaic extension and store the mosaic info in a mosaic collection.

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "stac_extensions": [
    "https://stac-extensions.github.io/mosaic/v1.0.0/schema.json",
  ],
  "id": "my search id",
  "bbox": [
    13.86148243891681,
    36.95257399124932,
    15.111074610520053,
    37.94752813015372
  ],
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          13.876381589019879,
          36.95257399124932
        ],
        [
          13.86148243891681,
          37.942072015005024
        ],
        [
          15.111074610520053,
          37.94752813015372
        ],
        [
          15.109620666835209,
          36.95783951241028
        ],
        [
          13.876381589019879,
          36.95257399124932
        ]
      ]
    ]
  },
  "properties": {
    "datetime": "2021-02-21T10:00:17Z",  // or null
    "name": "my mosaic", // OPTIONAl: name of the mosaic
    "stac_assets": ["image", "cog"]  // OPTIONAl: List of available assets in each STAC records
  },
  "collection": "mosaics",
  "assets": {
    "true_color": {
      "title": "True color Mosaic",
      "href": "https://endpoint/{searchid}/{z}/{x}/{y}.jpeg",
      "options": {
        "assets": ["B4", "B3", "B2"],
        "color_formula": "Gamma RGB 3.5 Saturation 1.7 Sigmoidal RGB 15 0.35",
      }
    },
    "ndvi": {
        "title": "NDVI Mosaic",
        "href": "https://endpoint/{searchid}/{z}/{x}/{y}.jpeg",
        "options": {
            "expression": "(B4-B3)/(B4+B3)",
            "rescale": "-1,1",
            "colormap_name": "viridis",            
        }
    }
  },
  "links": []
}

Note: if we prefer moving forward with a pure STAC solution it means that when the user register a search it will have to also register a STAC item to the mosaic collection OR we will let titiler-pgstac /register endpoint do it 🤷‍♂️

bitner commented 2 years ago

@vincentsarago I see the point now on mosaics only being records with "mosaic" metadata. If nothing else, we would need to make sure to put an index on the searches table to make sure that the mosaics could be easily separated. Thant being said, I like your idea of a mosaic collection -- that further would allow us to use all the search mechanisms "for free" on any metadata that is stored as a mosaic item.

bitner commented 2 years ago

If we went the mosaic collection route, rather than having a /list endpoint it would just me /mosaics/items and would have search/filters as well as paging already in place.

vincentsarago commented 2 years ago

re STAC way: I'm just a bit worry about creating a stac extension specific for titiler/titiler-pgstac. It seems to me that

put an index on the searches table to make sure that the mosaics could be easily separated

might just be easier 🙉

sharkinsspatial commented 2 years ago

I do like the idea of modeling mosaic endpoints as STAC items (though as @vincentsarago noted, I don't like losing the consistency of all mosaic related requests occurring on the mosaic path but that seems a small issue). If we do consider this approach a few thoughts/questions.

  1. There is significant conceptual overlap with this and the existing extension proposals tiled assets, virtual assets and composite. Personally I think we can avoid alignment with tiled assets as it would be overly verbose to advertise all of a mosaic's supported TileMatrixSets and the dynamic nature of the mosaic's item composition makes maintaining the Tile Matrix Limits difficult. It might be worth considering aligning with the processing:expression field for community consistency.

  2. Should mosaic asset href expose a url template (which is not a valid href) or the link to the tilejson? How much of this information should be packaged in asset and how much should be packaged in the tilejson? I'd lean towards packaging most of the descriptive information at the asset level and keeping tilejson standardized and minimal.

It would be helpful to know what the current model that is being used for mosaic endpoint discovery by client applications. I took a quick look at https://github.com/microsoft/PlanetaryComputerDataCatalog but it might be good to know how the PC explorer is referencing the mosaics and how the application might like to leverage a discovery endpoint.