OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.94k stars 2.57k forks source link

MSSQL Server geometry column not identified #1990

Open steveoh opened 5 years ago

steveoh commented 5 years ago

Expected behavior and actual behavior.

I expect a MSSQL Server geometry type to be a geometry when converting between formats.

image

Microsoft SQL Server 2014 (SP3-GDR) (KB4505218) - 12.0.6108.1 (X64) 
    May 29 2019 20:05:27 
    Copyright (c) Microsoft Corporation
    Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 14393: ) (Hypervisor)

and

Microsoft SQL Server 2014 (SP3-CU4) (KB4500181) - 12.0.6329.1 (X64)
    Jul 20 2019 21:42:29
    Copyright (c) Microsoft Corporation
    Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

I currently have the following drivers. I tried them all and decided to not specify any in particular since it made no difference.

Steps to reproduce the problem.

create a geometry type in sql server.

from osgeo import gdal

shp_options = gdal.VectorTranslateOptions(
    options=[
        '-f',
        'ESRI Shapefile',
        '-sql',
        "SELECT name,shape FROM dbo.COUNTIES",
        '-nln',
        'counties'
    ],
)

result = gdal.VectorTranslate(
    "c:\dev", "MSSQL:server=(local);database=UDEQ;trusted_connection=yes;", options=shp_options
)

When going to shapefile this error occurs:

The output driver does not natively support Binary type for field Shape. Misconversion can happen. -mapFieldType can be used to control field type conversion.

When going to postgis no error occurs but the shape is a bytea type.

A solution that I do not like for performance reasons is to rebuild the geometry object.

GEOMETRY::STGeomFromWKB(shape.STAsBinary(), srid)

This solves the shapefile and postgis conversions.

Operating system

Windows Server 2016 Standard

GDAL version and provenance

2.3.3 from conda

related to #1136

jratike80 commented 5 years ago

I would have a try with using-dialect OGRSQL https://gdal.org/drivers/vector/mssqlspatial.html. OGR dialect selects geometry by default so I guess you could use just "SELECT name FROM dbo.COUNTIES"

Alternatively you could try the SQLite dialect as -dialect SQLite -sql "SELECT name,shape FROM dbo.COUNTIES"

The driver documentation informs "The MS SQL Spatial driver passes SQL statements directly to MS SQL by default" and your route with GEOMETRY::STGeomFromWKB(shape.STAsBinary(), srid) at least matches with the documentation.

I have never used MSSQL myself and I can't make any tests so please understand that I am only guessing.

steveoh commented 5 years ago

so far the OGRSQL dialect is doing exactly what you stated. Other than some syntax changes to the query eg: 'SELECT "name" from "dbo.COUNTIES"'.

I did not try the SQLite dialect yet. But I plan to and will report back.

The MS SQL Spatial driver passes SQL statements directly to MS SQL by default

If the column is a geometry type already, wouldn't you think it would get picked up as a geometry without the superfluous cast? It's synonymous with int.parse(int.tostring())

Thanks for the suggestions @jratike80 💙

wumpz commented 4 years ago

Using FreeTds odbc driver on Linux does work as expected for version 2.2.x. Odbc driver from Microsoft seems to have this problem.

elpaso commented 1 year ago

I'm trying to reproduce the issue and I wrote this test (work in progress), there seems to be a very different behavior between the different dialect options (which is understandable), it is not clear to me what is the expected behavior from the SQL clause "SELECT eas_id, shape FROM tpoly", one possible outcome is a layer where the geometry is taken from shape (because the actual geometry column of the source layer ogr_geometry was not in the select clause and the only geometry column in the clause is shape. But, when using OGRSQL dialect the geometry is taken from ogr_geometry and not from clause and the shp driver fails to create another geometry column or a binary field to contain the shape column (which is also expected due to the nature of the destination driver).

I'm a bit confused about what is the expected behavior of the different dialects options, though I suspect there is not a real bug here.

@pytest.mark.parametrize("dialect_options", [
    None,
    ["-dialect", "SQLite"],
    ["-dialect", "OGRSQL"]
])
def test_geometry_column_identification(dialect_options):
    """Test for issue GH #1136
    SQL Server data source: geometry column not identified by ogr2ogr"""

    if gdaltest.mssqlspatial_ds is None:
        pytest.skip('MSSQLSpatial driver not available')

    # once
    try:
        gdaltest.mssqlspatial_ds.ExecuteSQL("ALTER TABLE dbo.tpoly ADD shape GEOMETRY")
        gdaltest.mssqlspatial_ds.ExecuteSQL("UPDATE tpoly SET shape = ogr_geometry.STCentroid()")
    except:
        pass

    options = ["-nln", "test_geometry_column_identification", "-f", "ESRI Shapefile", "-sql", "SELECT eas_id, shape FROM tpoly"]
    if dialect_options:
        options.extend(dialect_options)

    ds = gdal.VectorTranslate(
        "/vsimem/test_geometry_column_identification.shp",
        gdaltest.mssqlspatial_dsname,
        options=options
    )
    lyr = ds.GetLayerByName("test_geometry_column_identification")
    assert lyr
    f = lyr.GetFeature(1)
    geom = f.GetGeometryRef()
    assert geom is not None
    assert geom.GetGeometryName() == 'POINT'
    assert f.GetFieldCount() == 1
    #assert f.GetFieldType('shape') == ogr.OFTBinary
rouault commented 1 year ago

it is not clear to me what is the expected behavior from the SQL clause "SELECT eas_id, shape FROM tpoly",

without an explicit dialect, this is evaluated through the OGRMSSQLSpatialDataSource::ExecuteSQL() implementation which ultimately builds a OGRMSSQLSpatialSelectLayer layer. I suspect the bug is in the situation of the reporter the geometry column was misidentified as a binary column instead of a geometry column

But, when using OGRSQL dialect the geometry is taken from ogr_geometry and not from clause and the shp driver fails to create another geometry column or a binary field to contain the shape column (which is also expected due to the nature of the destination driver).

yes, that's expected. For historical reason, the OGRSQL dialect implicitly selects the geometry column of the source layer.

The bug here is not about OGRSQL as far as I understand. I suspect some issue with the particular ODBC driver used that may make the else if (EQUAL(poStmt->GetColTypeName(iColumn), "geometry"))test of ogrmssqlspatialselectlayer.cpp to fail

elpaso commented 1 year ago

I'm sorry but I cannot reproduce the issue using driver "ODBC Driver 17 for SQL Server" mentioned by the original poster.