BuzzCutNorman / tap-mssql

Singer Tap for MS SQL built with Meltano Singer SDK.
MIT License
2 stars 9 forks source link

bug: MS SQL DATE columns causing jsonschema ValidationError Failed validating 'format': 'date' #21

Closed BuzzCutNorman closed 1 year ago

BuzzCutNorman commented 1 year ago

Issue with DATE columns returning time info and not passing JSON schema validation when sent to a target. This error was found running meltano run tap-mssql target-mssql when the tap is reading from the AdventureWorks2019 database table HumanResources-EmployeeDepartmentHistory. The column StartDate which is a MS SQL data type of DATE is the offending column.

023-02-08T20:07:25.814139Z [info     ]     raise error                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.829802Z [info     ] jsonschema.exceptions.ValidationError: '2009-01-14T00:00:00+00:00' is not a 'date' cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.829802Z [info     ]                                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.829802Z [info     ] Failed validating 'format' in schema['properties']['StartDate']: cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.829802Z [info     ]     {'format': 'date', 'type': ['string']} cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.829802Z [info     ]                                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.845392Z [info     ] On instance['StartDate']:      cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.845392Z [info     ]     '2009-01-14T00:00:00+00:00' cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-08T20:07:25.861020Z [error    ] Loader failed
When the table is quired in SSMS you do not see a time stamp after the date. If you run meltano invoke tap-mssql you can see the time stamp of T00:00:00+00:00 is present after each date. BusinessEntityID DepartmentID ShiftID StartDate EndDate ModifiedDate
1 16 1 2009-01-14 NULL 2009-01-13 00:00:00.000
2023-02-08 12:04:03,402 | INFO     | tap-mssql            | Beginning full_table sync of 'HumanResources-EmployeeDepartmentHistory'...
2023-02-08 12:04:03,402 | INFO     | tap-mssql            | Tap has custom mapper. Using 1 provided map(s).
{"type": "SCHEMA", "stream": "HumanResources-EmployeeDepartmentHistory", "schema": {"properties": {"BusinessEntityID": {"minimum": -2147483648, "maximum": 2147483647, "type": ["integer"]}, "DepartmentID": {"minimum": -32768, "maximum": 32767, "type": ["integer"]}, "ShiftID": {"minimum": 0, "maximum": 255, "type": ["integer"]}, "StartDate": {"format": "date", "type": ["string"]}, "EndDate": {"format": "date", "type": ["string", "null"]}, "ModifiedDate": {"format": "date-time", "type": ["string"]}}, "type": "object", "required": ["BusinessEntityID", "DepartmentID", "ShiftID", "StartDate", "ModifiedDate"]}, "key_properties": ["BusinessEntityID", "StartDate", "DepartmentID", "ShiftID"]}
{"type": "RECORD", "stream": "HumanResources-EmployeeDepartmentHistory", "record": {"BusinessEntityID": 1, "DepartmentID": 16, "ShiftID": 1, "StartDate": "2009-01-14T00:00:00+00:00", "EndDate": null, "ModifiedDate": "2009-01-13T00:00:00+00:00"}, "time_extracted": "2023-02-08T20:04:03.464734+00:00"}
BuzzCutNorman commented 1 year ago

I am able to reproduce the issue with this simple table. Examples were grabbed from JSON Schema site and formatted to match MS SQL documentation.

use [testdata]
go
/*********************************
Create the simple test table
in an MSSQL database
*********************************/
DROP TABLE IF EXISTS [TestTable];
CREATE TABLE TestTable (
    Id int IDENTITY(1,1) PRIMARY KEY,
    TestColumn1 TIME,
    TestColumn2 DATE,
    TestColumn3 DATETIME,
    TestColumn4 DATETIME2
);

/*********************************
Insert some test data into the
test table
*********************************/
INSERT INTO [testdata].[dbo].[TestTable]
           ([TestColumn1],
        [TestColumn2],
        [TestColumn3],
            [TestColumn4])
     VALUES
           ('20:20:39.001', '2018-11-13', '2018-11-13 20:20:39', '2018-11-13 20:20:39.001')
;
BuzzCutNorman commented 1 year ago

findings

I found where the extra T00:00:00+00:00 is being added. In the singer_sdk\helpers\_typing.py file there is a function called _conform_primitive_property. In that function it looks for datetime.date and appends T00:00:00+00:00 to the ISO formated string. If I remove the append portion and the function returns just the ISO format date the data passes JSON validation. The data looks correct when it is inserted into the target table.

current not passing JSON validation

def _conform_primitive_property(elem: Any, property_schema: dict) -> Any:
    """Converts a primitive (i.e. not object or array) to a json compatible type."""
    if isinstance(elem, (datetime.datetime, pendulum.DateTime)):
        return to_json_compatible(elem)
    elif isinstance(elem, datetime.date):
        return elem.isoformat() + "T00:00:00+00:00"

possible fix that passes JSON validation

def _conform_primitive_property(elem: Any, property_schema: dict) -> Any:
    """Converts a primitive (i.e. not object or array) to a json compatible type."""
    if isinstance(elem, (datetime.datetime, pendulum.DateTime)):
        return to_json_compatible(elem)
    elif isinstance(elem, datetime.date):
        return elem.isoformat()
BuzzCutNorman commented 1 year ago

The DATE column present in my test TestTable triggers the same error after tap-mssql and target-mssql were upgraded to SDK 0.19.0 .

2023-02-13T16:03:45.295982Z [info     ]   File "C:\development\projects\dev-sql-dataype\.meltano\loaders\target-mssql\venv\lib\site-packages\jsonschema\validators.py", line 314, in validate cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.295982Z [info     ]     raise error                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.295982Z [info     ] jsonschema.exceptions.ValidationError: '2018-11-13T00:00:00+00:00' is not a 'date' cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.295982Z [info     ]                                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.295982Z [info     ] Failed validating 'format' in schema['properties']['TestColumn2']: cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.311608Z [info     ]     {'format': 'date', 'type': ['string', 'null']} cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.311608Z [info     ]                                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.311608Z [info     ] On instance['TestColumn2']:    cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.311608Z [info     ]     '2018-11-13T00:00:00+00:00' cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-13T16:03:45.342858Z [error    ] Loader failed
2023-02-13T16:03:45.342858Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False
Need help fixing this problem? Visit http://melta.no/ for troubleshooting steps, or to
join our friendly Slack community.

Run invocation could not be completed as block failed: Loader failed
PS C:\development\projects\dev-sql-dataype> meltano invoke  tap-mssql --version    
2023-02-13T16:03:57.120042Z [info     ] Environment 'dev' is active
tap-mssql v0.0.1, Meltano SDK v0.19.0
PS C:\development\projects\dev-sql-dataype> meltano invoke  target-mssql --version 
2023-02-13T16:04:05.476387Z [info     ] Environment 'dev' is active
target-mssql v0.0.1, Meltano SDK v0.19.0