PomeloFoundation / Pomelo.EntityFrameworkCore.MySql

Entity Framework Core provider for MySQL and MariaDB built on top of MySqlConnector
MIT License
2.69k stars 381 forks source link

Spatial Queries #1163

Closed aderrose closed 4 years ago

aderrose commented 4 years ago

Steps to reproduce

Using the project code from https://github.com/gavilanch/EFCoreSpatialQueries I've modified the ApplicationDBContext to point to a SQL Server 2016 LocalDB (as per original project), a MySQL 8.0.21 server (via Pomelo) and a MariaDB 10.5.5 server (via Pomelo).

Each time I change server the migrations need deleting, recreating and applying to the database, otherwise no code is modified from original project.

The issue

The project demonstrates loading the database with spatial data and querying that back limiting the dataset to places within 2000 meters.

The LocalDB database seems to work correctly and returns :

Agora from Santo Domingo (302.858257326255 meters away) Adrian Tropical from Santo Domingo (1188.9001109225335 meters away)

The following two entries can be displayed by removing the 2000 meters where clause:

Sambil from Santo Domingo (2863.422068459216 meters away) Restaurante El Cardenal from Mexito City (3073097.7298800712 meters away)

However, if I run the same code against MySQL 8 I get the following output:

Adrian Tropical from Santo Domingo (11858806.351341443 meters away) Sambil from Santo Domingo (11858886.04280507 meters away) Agora from Santo Domingo (11859348.903124722 meters away) Restaurante El Cardenal from Mexito City (11884292.875515593 meters away)

All of which are well over 2000 meters and very different to the LocalDB version.

When I run the code against MariaDB 10.5 I get the following:

Agora from Santo Domingo (0.0027362819628112514 meters away) Adrian Tropical from Santo Domingo (0.010865760255958656 meters away) Sambil from Santo Domingo (0.027108523012515796 meters away) Restaurante El Cardenal from Mexito City (29.211945550922625 meters away)

All of which are tiny distances and also very different.

If I try to reproduce the calculations through HeidiSQL using either ST_DISTANCE or ST_DISTANCE_SPHERE (which MariaDB is missing) I get different values again, 5516 meters with SRID of 4326 and 5425 meters if no SRID is set.

I'm now really confused which values are correct and where the issue lies, to me it looks like it's a database engine issue rather than Pomelo or even MySqlConnector as the same code produces different values.

Ultimately what I'm trying to achieve is a method of calculating distance between latitude and longitude coordinates based on data from GPX files to calculate an overall distance for a route.

Any help or explanation as to what is going on would be gratefully received.

Further technical details

MySQL version: MariaDB 10.5.5 on Debian 10 & MySQL 8.0.21 on Ubuntu 20.04 Pomelo.EntityFrameworkCore.MySql version: 3.20 Microsoft.AspNetCore.App version: 3.1.8

lauxjpn commented 4 years ago

@aderrose I'll test this in about 24 hours and get back to you.

lauxjpn commented 4 years ago

It turns out, that MySQL's WKT/WKB geometry creation functions like ST_GeomFromText() and ST_GeomFromWKB() will use the first (x) coordinate as latitude and the the second (y) coordinate as longitude, if an SRID of 4326 is being used:

(From 12.17.4 Functions That Create Geometry Values from WKB Values)

By default, geographic coordinates (latitude, longitude) are interpreted as in the order specified by the spatial reference system of geometry arguments.

These function are used when translating literal values to SQL, which is the case when seeding the data in the sample you referenced.

Most OpenGIS implementations use x as longitude and y as latitude, but MySQL and SQL Server don't.

MySQL 8 introduced multiple functions and optional parameters to mitigate this behavior, but since we need to support previous versions and MariaDB as well, these are not an option for us.

Instead, we are from now on swapping the X/Y coordinates when translating literal geometry values to SQL, by implementing a custom WKTWriter derived class.

I also opened https://github.com/NetTopologySuite/NetTopologySuite/issues/436, so we might get rid of our full WKTWriter derived implementation in favor of a simple property or overridable method.

It should also be mentioned for anybody using MariaDB, that in contrast to MySQL, MariaDB supports only Euclidean (planar) geometry at this point in time. So functions like ST_Distance() won't return minutes meters even if an SRID of 4326 is used (MariaDB just ignores SRID values).


Further information:

aderrose commented 4 years ago

Thanks @lauxjpn for digging into this,.

It looks like using MariaDB may be a non-starter because of the lack of SRID calculations, although to be fair most of the coordinates I'm likely to need to work with will come from GPX track data i.e. recorded data with many more points closer together than route data which tends to have far less and is used for route planning, meaning that the overall difference between planar and ellipsoidal methods won't be huge.

You mention not using the new MySQL 8 functions, so if I reference MySQL 5 functions for comparative calculations I should be on the right track?

In terms of C# and Pomelo I understand I need to create points using Y/X or Long/Lat rather than the other way around and Pomelo will swap the coordinates for me?

lauxjpn commented 4 years ago

@aderrose Consider my comment above just my research about this topic. I primarily put in there, so that I can link to this issue from the PR I will provide shortly. We will roll out a fix for this issue today, so in case you can wait for that, you won't have to do anything except reference the new package version, once its out.

If you cannot wait for that, you could temporarily swap the coordinate order in your seed data, but you would need to swap it back, once the new release is out, so I would just wait for the new release instead and not do anything.

Thanks for reporting this issue! It's a tricky one.


You mention not using the new MySQL 8 functions, so if I reference MySQL 5 functions for comparative calculations I should be on the right track?

Not really, it is just our justification how we will implement the fix for this issue (by swapping the X/Y order ourselves in Pomelo, instead of using the new MySQL 8 ST_SWAPXY() function, or the axis-order=long-lat option for ST_GeomFromText()).

In terms of C# and Pomelo I understand I need to create points using Y/X or Long/Lat rather than the other way around and Pomelo will swap the coordinates for me?

Once the update is out, you can still keep using the current order of x/longitude first and y/latitude second in all cases. It is what NetTopologySuite uses as well. Until the update is out, only when we translate a geometry object using ST_GeomFromText() and an SRID of 4326 is used, then an X/Y swap would need to be made.

It looks like using MariaDB may be a non-starter because of the lack of SRID calculations, although to be fair most of the coordinates I'm likely to need to work with will come from GPX track data i.e. recorded data with many more points closer together than route data which tends to have far less and is used for route planning, meaning that the overall difference between planar and ellipsoidal methods won't be huge.

One thing to remember here is that if planar coordinates are used, then the calculated result of e.g. the distance will be in the planar coordinate space as well. So if I calculate the distance between Berlin (13.404954, 52.520007 (lat, lon)) and Seattle (47.6062095, -122.3320708 (lat, lon)), then MySQL would return the distance in minutes meters (8142392.01 minutes meters) when using an SRID of 4326 while MariaDB would return the distance in the planar coordinate space (135.825937).

aderrose commented 4 years ago

@lauxjpn Thanks again, I'm in no rush and can wait for a fix. I was in part checking that my thinking was the right way around as I had seen that NTS using a Y/X coordinate system and given the results I was getting I wasn't sure if it was my code or something else.

aderrose commented 4 years ago

So if I calculate the distance between Berlin (13.404954, 52.520007 (lat, lon)) and Seattle (47.6062095, -122.3320708 (lat, lon)), then MySQL would return the distance in minutes (8142392.01 minutes) when using an SRID of 4326 while MariaDB would return the distance in the planar coordinate space (135.825937).

In order to check my understanding I've created the following MySQL query based on your coordinates:

SET @pt1 = ST_GeomFromText('POINT ( 52.520007 13.404954 )', 4326); # Berlin
SET @pt2 = ST_GeomFromText('POINT ( 47.6062095 -122.3320708 )', 4326); # Seattle

SELECT ST_DISTANCE(@pt1,@pt2),ST_DISTANCE_SPHERE(@pt1,@pt2);

In MySQL I get the result of ST_Distance as 8142392.01 (as per your result) and ST_Distance_Sphere as 8117916.97, I've checked the calculation using a distance calculator at DaftLogic.com http://tinyurl.com/y6g5yzrv which gives 8126999.73 meters so close enough over such a distance.

I'm interested that ST_Distance gives a larger value than ST_Distance_Sphere as I understand the later should work out the curvature and not just a point to point?

Now when I try and build a C# version I've got the following code:

GeometryFactory geometryFactory = NtsGeometryServices.Instance.CreateGeometryFactory(srid: 4326);
StartPoint = geometryFactory.CreatePoint(new Coordinate(x: 52.520007,y: 13.404954));
EndPoint = geometryFactory.CreatePoint(new Coordinate(x: 47.6062095, y: -122.3320708));
Distance = StartPoint.Distance(EndPoint);

Which produces 135.83 which is the same as MariaDB so I assume it's ignoring the SRID, I've read elsewhere that .Distance is supposed to ignore SRID and in order to calculate projected coordinates I need to work with ProjNET4GeoAPI.

Does that sound about right?

Thanks for all your help so far :)

lauxjpn commented 4 years ago

Technically, the C# version should have the coordinates in lon, lat order, but the expected distance of 135.83 should be the same:

GeometryFactory geometryFactory = NtsGeometryServices.Instance.CreateGeometryFactory(srid: 4326);
StartPoint = geometryFactory.CreatePoint(new Coordinate(x: 13.404954,y: 52.520007));
EndPoint = geometryFactory.CreatePoint(new Coordinate(x: -122.3320708, y: 47.6062095));
Distance = StartPoint.Distance(EndPoint);

Otherwise, everything you said should be correct.

I'm interested that ST_Distance gives a larger value than ST_Distance_Sphere as I understand the later should work out the curvature and not just a point to point?

I am not a mathematician. That being said:

ST_Distance_Sphere() does not calculate the length of a direct (curved) line between two points on the spherical representation of the earth, but the shortest distance. The shortest distance is usually different (from freemaptools.com):

(Shows the great circle distance arc instead of a direct line, even on this flat projection.) image

The simplest non-mathematical prove for this is the fact, that all listed businesses, and especially airlines, strife for maximum efficiency. The most important factor in the costs of a flight is fuel. With that in mind, take a look at the long distance flight paths on flightradar24.com. The planes fly the great circle distance arc as well, which resembles the one from the image above.


@aderrose The distinction between ST_Distance() and ST_Distance_Sphere() is a good point!

I took another look at how PostgreSQL and SQL Server handle distance calculations. Both default to euclidean distance for their geometry classes, but to great circle distance for their geography classes.

Since MySQL does not distinguish between geometry and geography by class but just by SRID, we should use ST_Distance() for an SRID of 0 (which we are already doing), but ST_Distance_Sphere() for an SRID of 4326.

Unfortunately, MariaDB has not yet implemented ST_Distance_Sphere() as of now, so we would need to manually calculate the distance when querying, if we want to support that same behavior for MariaDB as well.

References that might come in handy:

aderrose commented 4 years ago

@lauxjpn Thanks for the idea about airlines and their relation to great circle shortest distance and therefore the difference between ST_Distance and ST_Distance_Sphere, it finally makes sense :)

It's a tricky problem to sort and because of the lack of a geography class in MySQL and MariaDB and the lack of ST_Distance_Sphere in MariaDB I can see why Pomelo is doing what it is.

Unfortunately, I'm still having issues getting a result out of C# that is anything like what MySQL will produce, I'll keep plugging away for a bit.

lauxjpn commented 4 years ago

I will upload a PR in an hour or so. It will translate Geography.Distance() to ST_Distance_Sphere() on MySQL >= 5.7.6 and will emit a custom implementation (the Haversine formula) in all other cases (like MariaDB), when an SRID of 4326 is being used. If an SRID of 0 is used, it will just translate the call to the ST_Distance() function.

The PR also adds explicit EF.Functions.SpatialDistance() and EF.Functions.SpatialDistanceSphere() functions and fixes multiple bugs.

lauxjpn commented 4 years ago

I ended up extending the spatial support extensively in regards to distance calculations.

It's a tricky problem to sort and because of the lack of a geography class in MySQL and MariaDB and the lack of ST_Distance_Sphere in MariaDB I can see why Pomelo is doing what it is.

Pomelo has now implemented support for spherical distance calculations for MariaDB (when using SRID 4326).

Unfortunately, I'm still having issues getting a result out of C# that is anything like what MySQL will produce, I'll keep plugging away for a bit.

MariaDB natively just uses planar distance calculations for all SRIDs (i.e. Pythagorean theorem). MySQL 7 does the same when using ST_Distance(), but MySQL 8 uses the Andoyer algorithm for SRID 4326 and ST_Distance() (which is a spherical distance approximation).

After PR #1169 has been merged, Pomelo will automatically choose the most appropriate distance algorithm (either implemented natively by the database server, or using its own custom implementation) for the SRID used, when calling Geometry.Distance().


See #1169 for further details.

For some introduction to different distance algorithms and their performance, see 2019 - Geodesic algorithms: an experimental study on Youtube.

MySQL internally uses the Boost.Geometry library (Documentation).

aderrose commented 4 years ago

Thanks @lauxjpn I'll look forward to trying this out with the next release.

lauxjpn commented 4 years ago

@aderrose The PR is now part of our nightly builds. Feel free to check it out before the official release of 3.2.1 tomorrow, in case you want to give us some feedback about how it does.

aderrose commented 4 years ago

@lauxjpn I've added the nightly builds feed to NuGet, how do I now get the new version in Visual Studio?

lauxjpn commented 4 years ago

image

aderrose commented 4 years ago

Hi @lauxjpn,

I'm just trying out the preview build and I'm getting the following error:

Method 'Create' in type 'Pomelo.EntityFrameworkCore.MySql.Query.ExpressionVisitors.Internal.MySqlSqlTranslatingExpressionVisitorFactory' from assembly 'Pomelo.EntityFrameworkCore.MySql, Version=3.2.1.0, Culture=neutral, PublicKeyToken=2cc498582444921b' does not have an implementation.

This happens in the DBContextOptions function with the VS Debugger highlighting the base(options) section at the end of the line:

public MySQLContext(DbContextOptions options) : base(options) { }

This happens on both MySQL 8.0 or MariaDB 10.5 database connections.

lauxjpn commented 4 years ago

Please post the full exception message including the full stack trace. However, this sounds like a version problem to me. Is the nightly build version used consistently across all involved projects and is EF Core version 3.1.8 referenced in the involved projects?

aderrose commented 4 years ago

I'm created a standalone project with just the code from my original project which I'm working on uploading to GitHub, (I've only pulled code down before and never published anything).

This project has the nightly build for Pomelo and the NetTopologySuite addon, I don't specify EF Core standalone but the 3.2.1 build requires EF Core 3.1.8 or better from NuGet.

image

System.TypeLoadException: Method 'Create' in type 'Pomelo.EntityFrameworkCore.MySql.Query.ExpressionVisitors.Internal.MySqlSqlTranslatingExpressionVisitorFactory' from assembly 'Pomelo.EntityFrameworkCore.MySql, Version=3.2.1.0, Culture=neutral, PublicKeyToken=2cc498582444921b' does not have an implementation.
   at Microsoft.Extensions.DependencyInjection.MySqlServiceCollectionExtensions.AddEntityFrameworkMySql(IServiceCollection serviceCollection)
   at Pomelo.EntityFrameworkCore.MySql.Infrastructure.Internal.MySqlOptionsExtension.ApplyServices(IServiceCollection services)
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.ApplyServices(IDbContextOptions options, ServiceCollection services)
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.<>c__DisplayClass4_0.<GetOrAdd>g__BuildServiceProvider|3()
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.<>c__DisplayClass4_0.<GetOrAdd>b__2(Int64 k)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache.GetOrAdd(IDbContextOptions options, Boolean providerRequired)
   at Microsoft.EntityFrameworkCore.DbContext..ctor(DbContextOptions options)
   at Spatial.Data.MariaDBContext..ctor(DbContextOptions`1 options) in E:\Developer\Code\Spatial\Spatial\Data\MariaDBContext.cs:line 8
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite constructorCallSite, RuntimeResolverContext context)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite callSite, TArgument argument)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite callSite, RuntimeResolverContext context, ServiceProviderEngineScope serviceProviderEngine, RuntimeResolverLock lockType)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitScopeCache(ServiceCallSite singletonCallSite, RuntimeResolverContext context)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.<RealizeService>b__0(ServiceProviderEngineScope scope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngine.GetService(Type serviceType, ServiceProviderEngineScope serviceProviderEngineScope)
   at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)
   at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.GetServiceOrCreateInstance(IServiceProvider provider, Type type)
   at Microsoft.EntityFrameworkCore.Design.Internal.DbContextOperations.<>c__DisplayClass13_3.<FindContextTypes>b__11()
   at Microsoft.EntityFrameworkCore.Design.Internal.DbContextOperations.CreateContext(Func`1 factory)
   at Microsoft.EntityFrameworkCore.Design.Internal.DbContextOperations.CreateContext(String contextType)
   at Microsoft.EntityFrameworkCore.Design.Internal.MigrationsOperations.AddMigration(String name, String outputDir, String contextType, String namespace)
   at Microsoft.EntityFrameworkCore.Design.OperationExecutor.AddMigrationImpl(String name, String outputDir, String contextType, String namespace)
   at Microsoft.EntityFrameworkCore.Design.OperationExecutor.AddMigration.<>c__DisplayClass0_0.<.ctor>b__0()
   at Microsoft.EntityFrameworkCore.Design.OperationExecutor.OperationBase.<>c__DisplayClass3_0`1.<Execute>b__0()
   at Microsoft.EntityFrameworkCore.Design.OperationExecutor.OperationBase.Execute(Action action)
Method 'Create' in type 'Pomelo.EntityFrameworkCore.MySql.Query.ExpressionVisitors.Internal.MySqlSqlTranslatingExpressionVisitorFactory' from assembly 'Pomelo.EntityFrameworkCore.MySql, Version=3.2.1.0, Culture=neutral, PublicKeyToken=2cc498582444921b' does not have an implementation.
lauxjpn commented 4 years ago

@aderrose If you want to share the project, you can also just attach the zipped files to a post if that is easier for you.

aderrose commented 4 years ago

Now that's interesting, I've been using the latest Visual Studio Preview because I've also been working with .NET 5.0 in another separate project, up until tonight it's been okay but if I open my project in the mainline version of Visual Studio 2019 it works okay.

It's probably also worth noting that both IDE's have received an update this today maybe that broke something. It must be a slight issue in the preview or the SDK as the project uses .Net Core 3.1.

I'll continue testing with the supported version and see how I get on :)

lauxjpn commented 4 years ago

@aderrose The method signature for IRelationalSqlTranslatingExpressionVisitorFactory.Create() has been change between EF Core 3.1 and EF Core 5.0.

If you just updated all nuget packages and had Include Prereleases enabled, you are now likely to have referenced an EF Core 5.0 RC release. So double check (including the raw csproj file) that you explicitly reference the 3.1.8 versions of EF Core (and are using no preview versions of other packages, except Pomelo).


This shows however, that it is time for our next release to explicitly state, that we are only compatible with EF Core < 5.0, since we do definitely know at this point, that Pomelo 3.2.x is not compatible with EF Core 5.0 (see commit https://github.com/PomeloFoundation/Pomelo.EntityFrameworkCore.MySql/commit/2c6c3fa16ebae18c537dbe6176ab5caedd57d14d).

aderrose commented 4 years ago

@lauxjpn Having just checked my package list you are right, Microsoft.AspNetCore.Mvc.Razor.RuntimeCompilation and Microsoft.EntityFrameworkCore.Tools had been bumped to 5.0.0 previews which would upgrade EF Core as well.

I've specified EF Core 3.1.8 now for good measure.

lauxjpn commented 4 years ago

@aderrose I also just added a commit, that will result in a compiler error, if an EF Core version of 5.0.0+ is being used with Pomelo 3.2x. Should be part of the latest nightly build in about 10 minutes (20:15 UTC).

aderrose commented 4 years ago

@lauxjpn Thanks for the work so far, I've included the official 3.2.1 packages now and I've got a stripped down working project that compares the distance between the locations in project in my original post https://github.com/gavilanch/EFCoreSpatialQueries with an added location for London based on Google Maps coordinates for London.

I've also compiled a list of the distances using the Distance Calculator from Daft Logic (which uses Google Maps and seems to get closer to the expected distance than Free Map Tools which is based on OpenStreetMaps).

Largely, Daft Tools, MSSQL and MySQL are all comparable, however, MariaDB is still providing some odd answers. I've also checked against MySQL directly using ST_Distance_Sphere and get the same answers that the app is reporting which is good.

The results I get are as follows: image

My project is available on GitHub.

I'm going to start looking at some of the GPS data I have and see how that works out.

If you have any thoughts about MariaDB, I know you included some custom logic to replicate the missing ST_Distance_Sphere in MariaDB, do I need to do anything to include that. I have included the server versions in the Startup.cs for this project which I don't usually as it seems to work without.

lauxjpn commented 4 years ago

@aderrose Thanks for looking into this!

As it turned out, the X/Y and lon/lat swapperoo when using ST_GeomFromText() or ST_GeomFromWKB() is MySQL 8 specific. For MySQL 7 and below and MariaDB in general, X still translates to longitude and Y still translates to latitude.

The #1174 PR implicitly fixes this, by abandoning the whole ST_Geom...() approach in favor of just using a hex string WKB representation directly, which should work across all supported database implementations and versions.

lauxjpn commented 4 years ago

@aderrose PR #1174 is now part of the latest nightly build.

aderrose commented 4 years ago

@lauxjpn With a small variance PR #1174 seems to have fixed the issues/functionality with MariaDB:

image

Now just to sort my project, I've realised that using EF Core won't work as I first expected as any calculations using NetTopologySuite won't use EF Core or any of the enhancements that Pomelo makes to MariaDB etc.

What I need to do is store the point information in the database and on the recall from MySQL or MariaDB calculate the distance between points which is very similar to what I was doing already with EF Core and a stored procedure, what this will enable is working fully with C#/EF Core and code-first DB programming.

Once again thank for the work you've put in :)