elixir-ecto / tds

TDS Driver for Elixir
https://hexdocs.pm/tds/
Apache License 2.0
116 stars 77 forks source link

Geometry column type support #147

Open mjaric opened 1 year ago

mjaric commented 1 year ago

Spatial data represents information about the physical location and shape of geometric objects. These objects can be point locations or more complex objects such as countries/regions, roads, or lakes.

SQL Server supports two spatial data types: the geometry data type and the geography data type.

Both data types are implemented as .NET common language runtime (CLR) data types in SQL Server.

Simple types:

Collection types:

Data storage is binary, tho there are function STAsText and STGeomFromText that can be used to display or parse string representation, The rest of functions can be found here

Binary storage for geography is packed in format:

SRID (4 bytes) | Endianness (1 byte) | Geometry type (4 bytes) | Geometry data (variable)

All SRIDs can be found in

SELECT * FROM sys.spatial_reference_systems;

Commonly used is 4326 or WGS 84 (World Geodetic System 1984) coordinate system. It can be identified from first 4 bytes of binary data ( for instance 4326 is seen stored in spatial data column as 0xE6100000.... in little endianness).

fnicastri commented 1 year ago

For now my workaround is to exclude the real field from the schema and using a virtual field filled with a fragment in the query:

query =
      from p in Project,
        select_merge: %{location: fragment("CAST(Location AS nvarchar(100))")}

    Repo.one!(query)

or

query =
      from p in Project,
        select_merge: %{location: fragment("Location.STAsText()")}

    Repo.one!(query)
mjaric commented 1 year ago

The more I look at it, the more I think it should be just ecto custom type. Do we need structs or string is good enough?

fnicastri commented 1 year ago

I think the first step should be to make TDS aware of the types so it will not error. If you query directly the db with t-sql it return an Hex. As far as I understand, I'm not a SQLServer expert, you use the included functions to decode them, as I did with the fragment, unless you are in c# land, then they are 'native'.

Ideally we would have all the types already decoded, maybe in WKT or WKB, to manipulate the values with the appropriate libraries.

Microsoft SQL Server CLR Types Serialization Formats

fnicastri commented 1 year ago

Hi @mjaric @josevalim

I'm facing another limitation now, I moved all containers on my M1 Mac so I have to use the azure-sql-edge docker image. azure-sql-edge is not complete and can't use the CLR functions like .STAsText() or cast to nvarchar, therefore the fragment doesn't work anymore.

This will not impact the production system but is not ideal from a dev POV, do you think we can at least find a way to get the binary value from the geography/geometry column? To not crash Ecto during the queries.

mjaric commented 1 year ago

We are trying to make plan for binary format support, that should resolve STAsString issue, but if that is the case, Im not sure if you will be able to use other geometry::* functions.

Did you try to run other sqlerver image in amd64 emulated mode?

fnicastri commented 1 year ago

We are trying to make plan for binary format support, that should resolve STAsString issue,

This is awesome!

but if that is the case, Im not sure if you will be able to use other geometry::* functions.

Why? You mean with the arm version?

I'm less than worried to no be able to use the geometry functions, I don't really need to query the database with geo* functions. I need just the values of the geometries in WKT/WKB or in another standard format. I can even try to decode them myself with a bit of help if I can get the binary. We just have POINTs in that field, it should not be complex, right?

What I need is to extract some POINTs from this legacy DB to interact with the data in another new system.

The legacy system will go under a total rewrite anyway later this year and we will use Postgres.

Did you try to run other sqlserver image in amd64 emulated mode?

No, for now I tried just the arm one and disabled the problematic field, I read this is a bit problematic under Monterey.

mjaric commented 1 year ago

Then you will be covered 😄. I will definitely try to figure out workaround for arm issue anyways and make instructions in readme

fnicastri commented 4 months ago

@mjaric Any progress on this?

The big rewrite got postponed so we are stuck with the legacy database, meanwhile we are porting smaller pieces of our system to elixir and we need to access one of the geography columns.

We can do it converting the geometry during the query with STAsText() but I would like to explore the decoding of the geography data on the Elixir side.

Can you help me to get the raw binary from the database?

I'm hitting Unsupported Token code 0x3 in Token Stream