BuzzCutNorman / tap-mssql

Singer Tap for MS SQL built with Meltano Singer SDK.
MIT License
1 stars 9 forks source link

bug: MS SQL Timestamp (Rowversion) data type is being seen as DateTime type #53

Closed BuzzCutNorman closed 11 months ago

BuzzCutNorman commented 1 year ago

The timestamp or rowversion data type in MS SQL which a special type of varbinay(8) or binary(8) is being seen as a date time type since that is the name of a date time type in postgres. I will need to update this in the org_to_jsonchema_type and hd_to_jsonschema_type. This is the error that occurs when a value of the data type is encoutered during a meltano run

dateutil.parser._parser.ParserError: Unknown string format: 000000000d5e8a32 
ValueError: Could not parse value '000000000d5e8a32' for field 'Version'.

MS documentation about this data type: https://learn.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql?view=sql-server-ver16

BuzzCutNorman commented 1 year ago

I think will be the add to hd_to_jsonschema_type:

if sql_type_name in ['ROWVERSION', 'TIMESTAMP']:
    return {
        "type": ["string"],
        "contentEncoding": "base64",
        "maxLength": 12
    }
BuzzCutNorman commented 1 year ago

Found this entry in the SQLA documentation.

The ROWVERSION datatype does not reflect (e.g. introspect) from the database as itself; the returned datatype will be TIMESTAMP.

This is a read-only datatype that does not support INSERT of values.

Since column is a readonly and you are not able to insert into it. I think when copying a timestamp column over utilizing the base equivalent which is varbinary(8) is the best approach. This will also be easier to handle the data in other SQL databases like Postgres.

SQLA documentation: https://docs.sqlalchemy.org/en/20/dialects/mssql.html#sqlalchemy.dialects.mssql.ROWVERSION

BuzzCutNorman commented 1 year ago

In org_to_jsonschema_type I think it would be best to convert this to a string:

if str(sql_type) in ['ROWVERSION', 'TIMESTAMP']:
    sql_type = "string"