datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
437 stars 85 forks source link

[196] Add support for MSSQL #204

Closed RobertLD closed 4 months ago

RobertLD commented 4 months ago

Add framework for SQLServer support.

Currently the specification differs from what is expected by the SODA-CLI so this will have to be updated to match before the added tests will pass

https://docs.soda.io/soda/connect-mssql.html

Related to: #196

RobertLD commented 4 months ago

@simonharrer I'm not sure the spec for SQLServer is defined correctly as these are indicated as required fields on the schema (unless I am mistaken)

image

I would expect something like

image

to be an acceptable server def

jochenchrist commented 4 months ago

@RobertLD There was an error in the Data Contract JSON Schema (https://datacontract.com/datacontract.schema.json). I fixed the list of required fields there.

RobertLD commented 4 months ago

@RobertLD There was an error in the Data Contract JSON Schema (https://datacontract.com/datacontract.schema.json). I fixed the list of required fields there.

Sorry to bother you again but there appears to be one more issue with the schema. SQLServer types aren't supported in the field types yet. There are a whole bunch of them, but for now datetime and datetime2 are missing.

image

jochenchrist commented 4 months ago

Sorry to bother you again but there appears to be one more issue with the schema. SQLServer types aren't supported in the field types yet. There are a whole bunch of them, but for now datetime and datetime2 are missing.

The field.type represents a logical type, and it is a finite list of types. https://datacontract.com/#data-types You'd need to define a mapping to / from these types.

In addition, you can specify the physical MS SQL Server Types in a config block, like this: https://datacontract.com/#config-object

models:
  my_table:
    fields:
      my_field_1:
        description: Example for a timestamp without time zone
        type: timestamp_ntz
        config:
          sqlserverType: DATETIME2

This config object is quite new, if you have any improvement ideas, I'd be happy to hear them.

RobertLD commented 4 months ago

Thank you for the additional context, I'll explore this route now

On Fri, May 17, 2024, 10:43 AM jochenchrist @.***> wrote:

Sorry to bother you again but there appears to be one more issue with the schema. SQLServer types aren't supported in the field types yet. There are a whole bunch of them, but for now datetime and datetime2 are missing.

The field.type represents a logical type, and it is a finite list of types. https://datacontract.com/#data-types You'd need to define a mapping to / from these types.

In addition, you can specify the physical MS SQL Server Types in a config block, like this: https://datacontract.com/#config-object

models: my_table: fields: my_field_1: description: Example for a timestamp type: timestamp config: sqlserverType: DATETIME2

This config object is quite new, if you have any improvement ideas, I'd be happy to hear them.

— Reply to this email directly, view it on GitHub https://github.com/datacontract/datacontract-cli/pull/204#issuecomment-2117763433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGZJYQHR3ZT4CIVUR5E53LZCYJRNAVCNFSM6AAAAABHYTVY52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXG43DGNBTGM . You are receiving this because you were mentioned.Message ID: @.***>

RobertLD commented 4 months ago

type: timestamp_ntz config: sqlserverType: DATETIME2

I updated the datacontract to use the config object and it doesn't appear to be actually overwriting the type to check against.

image

image

image

We're in the end-zone now; this is the last element before I formalize the work here and rebase to something cleaner. Thank you a lot for the support in doing the work!

simonharrer commented 4 months ago

Thanks already for your dedication.

You would need something like this for the type mapping https://github.com/datacontract/datacontract-cli/commit/b6bd8fbec07a0531698e6cf061a19c3c72bbe041

And we need a type mapping for sqlserver anyway in datacontract/export/sql_type_converter.py

RobertLD commented 4 months ago

Thanks already for your dedication.

You would need something like this for the type mapping b6bd8fb

And we need a type mapping for sqlserver anyway in datacontract/export/sql_type_converter.py

Added the serializer for sqlserver data-types and added the correct driver to the docker file. All of the functionality ought to be there now.

I've also rebased ontop of main so it should be a bit easier to review now

RobertLD commented 4 months ago

Nice work. Only two minor things.

Additionally, please add an entry to CHANGELOG.md what you have done and make sure to add the server type in the README.md

Future work could be to support for datacontract export --format sql for sqlserver along with a test and make sure that the types are correctly set during datacontract import --format sql.

Made the requested changes to the changelog and readme. As for the future work, I can put together two issues for those to stop this issue from expanding to gigantic proportions. I'll start working on those next; if you're alright with splitting them out into separate issues

simonharrer commented 4 months ago

Sure, feel free to create those issues.

simonharrer commented 4 months ago

My last issue is, that in order to run the tests, I now have to install the ODBC driver on my machine.

Instead, I'd like that this sqlserver test is ignored by default, and only executed in the CI/CD pipeline. Can you implement that? Otherwise, it is good to go. Thanks for your dedication here!

RobertLD commented 4 months ago

My last issue is, that in order to run the tests, I now have to install the ODBC driver on my machine.

Instead, I'd like that this sqlserver test is ignored by default, and only executed in the CI/CD pipeline. Can you implement that? Otherwise, it is good to go. Thanks for your dedication here!

Oh course! I'll implement these changes. I'm actually unsure of how I would implement this though. I'll have to do some googling on my options for skipping the test outside of CI/CD

RobertLD commented 4 months ago

My last issue is, that in order to run the tests, I now have to install the ODBC driver on my machine.

Instead, I'd like that this sqlserver test is ignored by default, and only executed in the CI/CD pipeline. Can you implement that? Otherwise, it is good to go. Thanks for your dedication here!

Changes made; easier than anticipated haha

RobertLD commented 4 months ago

I'll add it back in a few hours, then. My mistake.

On Wed, May 22, 2024, 4:55 PM Dr. Simon Harrer @.***> wrote:

@.**** commented on this pull request.

In Dockerfile https://github.com/datacontract/datacontract-cli/pull/204#discussion_r1610637288 :

@@ -22,7 +22,7 @@ RUN python -c "import duckdb; duckdb.connect().sql(\"INSTALL httpfs\");"

FROM ubuntu:22.04 AS runner-image

-RUN apt-get update && apt-get install --no-install-recommends -y python3.11 python3.11-venv msodbcsql18 && \ +RUN apt-get update && apt-get install --no-install-recommends -y python3.11 python3.11-venv && \

I'm fine with it being in the Dockerfile as we'll ship the CLI as a docker container, and it should come prepacked with necessary software packages that it works out of the box.

— Reply to this email directly, view it on GitHub https://github.com/datacontract/datacontract-cli/pull/204#discussion_r1610637288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGZJYRWO5Q44NT4WZJHROLZDUA47AVCNFSM6AAAAABHYTVY52VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDANZSGM2DQNRTGA . You are receiving this because you were mentioned.Message ID: @.***>

simonharrer commented 4 months ago

Thanks for your dedication to bring this PR through! :-)

RobertLD commented 4 months ago

Of course, thank you for the assistance in getting it done. I'll follow up with those smaller issues shortly and hopefully knock those out soon.

On Fri, May 24, 2024, 6:47 AM Dr. Simon Harrer @.***> wrote:

Merged #204 https://github.com/datacontract/datacontract-cli/pull/204 into main.

— Reply to this email directly, view it on GitHub https://github.com/datacontract/datacontract-cli/pull/204#event-12923254020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGZJYV4CWWKRTECY6DJTDDZD4LFBAVCNFSM6AAAAABHYTVY52VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSHEZDGMRVGQYDEMA . You are receiving this because you were mentioned.Message ID: @.*** com>