geotrellis / geotrellis-contrib

GeoTrellis functionality related to the core.
Apache License 2.0
5 stars 11 forks source link

RasterSource Metadata API #181

Closed echeipesh closed 5 years ago

echeipesh commented 5 years ago

RasterSource needs to expose metadata of the underlying rasters through API that does not require side-loading the metadata from format specific readers.

This presents a challenge because current instances of raster source already range over multiple file formats with GDAL and storage formats with GeoTrellis layers.

Starting expectation is that this Metadata class hierarchy may expose common raster metadata but after that require matching/reflection to determine what is available for specific source. This API should not rely on type parameters since the type information will not be available at runtime for this I/O module.

Some traits that may exist in this hierarchy are:

We should consider if we can re-use some existing API for this purpose:

Above interfaces are Java interfaces and may add too much burden on the user code when using them in Scala (for instance org.opengis.metadata.MetaData makes heavy use of java Collection which would encourage heavy use of JavaConverters. We should still take a look and make a considered choice there.

metasim commented 5 years ago

One thing that will require investigation is how we robustly translate between the WKT-centric representation of CRS in CooordinateReferenceSystem vs. our (superior) proj4 representation. Here's the best hack I could come up with which trying to rectify the differences:

    def readCRS: Option[CRS] =
      withDatastore(
          ds => Option(ds.getSchema.getCoordinateReferenceSystem)
            .flatMap { crs =>
              val wkt = crs.toWKT.replaceAll("[\\t\\n ]", "")
              // TODO: The GeoTrellis WKT parser is very limited in what it can convert to proj4.
              // It requires there to be an EPSG code in it.
              // This may result in empty CRS even though a valid WKT is provided.
              // See: https://github.com/locationtech/geotrellis/issues/2871
              try {
                Some(ProjCRS.fromWKT(wkt))
              }
              catch {
                case NonFatal(_) =>
                  Option(CRS.lookupEpsgCode(crs, true))
                    .map(ProjCRS.fromEpsgCode(_))
                    .orElse {
                      logger.warn("Unable to identify and encode CRS:\n" + wkt)
                      None
                    }
              }
            }
      )
pomadchin commented 5 years ago

Some research results:

Both imageio and geotools use IIOMetadata to implement TIFFs metadata, but IIOMetadata implements only the following functions set:

  def isReadOnly: Boolean
  def getAsTree(formatName: String): Node
  def mergeTree(formatName: String, root: Node): Unit
  def reset(): Unit

In other words it allows to interpret metadata as XML. We we really want it than it makes sense to implement it. The usage example where we can see that they just allow to convert Map into an XML tree

An alternative approach is to reuse our own Tags format that matches GDAL representation 1 in 1 (just wraps everything into Tags case class)

A bit separate question is related to GeoTrellisRasterSources since it's metadata can depend on they key type. For now it will be only SpatialKeys metadata, however we can get runtime converters / cast functions / some reflection usage to convert it into a necessary type and it is completely incompatible to the GeoTiff and GDAL RasterSources. More over it just duplicates all the functions that are already exposed in the API as separate functions (extent, cols / rows, mapTransform, ....).

So we can definitely still expose it.

A couple of possible approaches:

trait RasterSourceMetadata
case class GeoTiffMetadata(tags: Tags) extends RasterSourceMetadata
case class GDALMetadata(tags: Map[Domain, Map[String, String]]) extends RasterSourceMetadata
// or
case class GDALMetadata(tags: Tags) extends RasterSourceMetadata
// this looks confusing since contains nothing special
case class GeoTrellisMetadata(metadata: TileLayerMetadata) extends RasterSourceMetadata 

Instead of Tags we can also implement a new Tags version that would extend IIOMetadata that will allow us to work with metadata as with some XML tree with an old API to work with.

metasim commented 5 years ago

A bit separate question is related to GeoTrellisRasterSources since it's metadata can depend on they key type. For now it will be only SpatialKeys metadata, however we can get runtime converters / cast functions / some reflection usage to convert it into a necessary type and it is completely incompatible to the GeoTiff and GDAL RasterSources.

Am I right that technically, if you've encoded TileFeatures, the metadata could be completely arbitrary (assuming an Avro codec)?

pomadchin commented 5 years ago

@metasim it dpends on what do you mean by that, but I decided to make everything just a Map[String, String] in fact (after observing the GIS world, there is just nothing better :/). And it looks pretty straightforward; you're welcome to review it: https://github.com/geotrellis/geotrellis-contrib/pull/216

Also an important note: metadata now is smth that is not exposed via the RasterSources API (cols / rows / extent / etc). Also we decided to introduce a RasterSourceMetadata parent trait, that in fact represents all the metadata (including tags, cols / rows, extent, etc).

At this point I am not sure how it is related to Feature types since these are RasterSources ¯\_(ツ)_/¯; would also be happy to hear the entire story.

metasim commented 5 years ago

It dpends on what do you mean by that, but I decided to make everything just a Map[String, String]

Totally think that's the right call. Was more of comment, reinforcing your general sentiment that there's no easy, path to a single, generalized metadata solution. IOW, 💯to what you said!