dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.69k stars 3.17k forks source link

Support mapping (and querying) of geometry collections #30630

Open roji opened 1 year ago

roji commented 1 year ago

GIS has a concept of a collection of geometry objects; this is represented by NetTopologySuite's GeometryCollection type, which extends Geometry and which we map to the database geometry/geography types (e.g. GeometryCollection in SQL Server). In most databases we also translate some spatial functions.

As part of #29427, we're allowing to map a .NET collection of any supported type to a JSON column in the database; but since collections of spatial objects have a special representation, we're excluding that specifically. Since GeometryCollection already implements IEnumerable<Geometry>, we probably don't want to also allow mapping e.g. Geometry[]. However, most database don't allow unpacking a geometry collection to a queryable table (PG supports this via ST_Dump); this means mapping Geometry[] to a regular JSON string column in the database does an advantage (queryability) over mapping to a database geometry collection.

Note: SQL Server OPENJSON doesn't natively support geometry, i.e. does not allow using geometry in its WITH clause. But we can still get WKT out as a string and just wrap it in a function which returns the geometry object.

roji commented 1 year ago

Some thoughts on why someone would want to use List<Point> over GeometryCollection:

roji commented 1 year ago

See https://github.com/npgsql/efcore.pg/issues/2722 for an example of a user attempting to map List<Point> instead of GeometryCollection and how that complicated things. In a nutshell, GeometryCollection can also function as a non-collection geometry, so that general special functions can be used with it (e.g. you can intersect directly with it, rather than do a complex thing where you intersect with all of its points).

Thinking about this some more, there may be a good compromise here... We could allow users to map List<Geometry> as a string (via JSON), but require that to be done explicitly, i.e. by specifying the store type as nvarchar(max). This is in contrast with other collection types, where we'd default to JSON mapping even if there's no store type specified. That would help guide users towards GeometryCollection rather than JSON, but would still allow JSON where that's desired.

roji commented 1 year ago

Design decision: don't special case geometry in any way:

roji commented 1 year ago

Requires #30677 to control JSON serialization of spatial types as WKT.

roji commented 1 year ago

@ajcvickers this should be doable not that JsonValueReaderWriter is available on the type mappings, as just another case in #30727. However, we also need CollectionToJsonStringConverter to use it (currently it tries to just do JsonSerializer.Serialize() on the NetTopologySuite object, which fails. I'll revisit this once we wire everything together, should be easy.

Note to self: remember that SQL Server OPENWITH with WITH doesn't support geometry; we'll have to rewrite it to OPENWITH without JSON in postprocessing, just like we do when ordering needs to be preserved. The casting to a SQL Server relational geometry value will be introduced there as well.

roji commented 1 year ago

Note mini-investigation of geometry and IN/equality: https://github.com/dotnet/efcore/issues/31912#issuecomment-1742921965

vigouredelaruse commented 4 months ago

i wanted to help spur this tangent on by formally mentioning the dreaded geojson featurecollection and the challenging openapi schema (polyglot) of site and vendor specific implementions of ogc spec responses. there is also a nod to the use of jsonb in timescaledb

while bearing in mind this image

consider https://api.weather.gov/zones/forecast/MIZ034. note that the response contains domain and organization specific properties

       "forecastOffices": [
            "https://api.weather.gov/offices/APX"
        ],
        "timeZone": [
            "America/Detroit"
        ],
        "observationStations": [
//etc

consider https://api.weather.gc.ca/collections/ahccd-stations/items. note that the response contains domain and organization specific properties

           "properties": {
                "identifier__identifiant": "2400305",
                "station_id__id_station": "2400305",
                "station_name__nom_station": "ALERT",
                "measurement_type__type_mesure": "temp_mean",
                "period__periode": "Ann",
                "trend_value__valeur_tendance": 1.5,
                "elevation__elevation": 65,
                "province__province": "NU",
                "joined__rejoint": 1,
                "year_range__annees": "1951-2020",
                "start_date__date_debut": "1950-07-01",
                "end_date__date_fin": "2020-12-01"
            },

the examples will vary for every endpoint by scientific domain and data collection epoch.

one should here formally state that this schema variability is a feature of the wireformat data models of the underlying scientific domains. indeed one (where one = hil +- human) queries the metadata, discovers dimensions and attributes of interest and composes further queries to actually retrieve the data.

for instance, the response to this request contains https:$reflike pointers to gridded met forecast datapoints https://api.weather.gov/points/39.7456,-97.0892 image

following the link to the gridded data https://api.weather.gov/gridpoints/TOP/32,81 yields image

implementations seeking to exploit this dataset might want to apply a tool such as timescaledb to the advertised timeseries atoms https://docs.timescale.com/use-timescale/latest/time-buckets/about-time-buckets/

image

once timescaledb is in the mix one quickly checks its support for jsonb and though for performance reasons it's not advised, timeseries functions are available against jsonb

one may here assert that

one also here provides evidence that this (consume, mutate and persist gis datasets) has been a pain point for gis developers for a long time

mov ax, 'thanks'