BuzzCutNorman / tap-mssql

Singer Tap for MS SQL built with Meltano Singer SDK.
MIT License
2 stars 9 forks source link

bug: Fix support for MS SQL `TIME` data type #25

Open BuzzCutNorman opened 1 year ago

BuzzCutNorman commented 1 year ago

Currently we leverage super().to_jsonschema_type() to handle type. The function does not support time so it was being sent back as a sting with no format.

BuzzCutNorman commented 1 year ago

The PR to add TIME will be in #23

BuzzCutNorman commented 1 year ago

There is a JSON validation error when it comes to TIME being validated on the target side for normal signer messages.

2023-02-24T19:20:24.795484Z [info     ]     raise error                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.795484Z [info     ] jsonschema.exceptions.ValidationError: '20:20:39.001000' is not a 'time' cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.795484Z [info     ]                                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.795484Z [info     ] Failed validating 'format' in schema['properties']['TestColumn1']: cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.795484Z [info     ]     {'format': 'time', 'type': ['string', 'null']} cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.811111Z [info     ]                                cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.811111Z [info     ] On instance['TestColumn1']:    cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.811111Z [info     ]     '20:20:39.001000'          cmd_type=elb consumer=True name=target-mssql producer=False stdio=stderr string_id=target-mssql
2023-02-24T19:20:24.826740Z [error    ] Loader failed
2023-02-24T19:20:24.826740Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False

I can resolve the validation issue for batch messages generated via json.dump() by extending the default JSONEncoder class and placing the following code in it.

        # Time in ISO format truncated to the second to pass
        # json-schema validation
        if isinstance(obj, datetime.time):
            return obj.isoformat(timespec='seconds')

In testing I have found that if I change the line for datetime.time in _conform_primitive_property to the following singer messages will also pass validation.

    elif isinstance(elem, datetime.time):
        return elem.isoformat(timespec="seconds")
BuzzCutNorman commented 1 year ago

Meltano was able to track the issue to the JSON Schema 3 Validator FormatChecker which is used by default. If I specify a more recent version of the FormatChecker, 7 or greater, the issue goes away. Meltano is looking into implementing an easy way for developers to specify a JSON Schema Validator and/or FormatChecker.

I am going to put in place a workaround Meltano inspired but leave this case open until I can replace the workaround with the official SDK version.

BuzzCutNorman commented 1 year ago

Just found out that this only happens when 'pyodbc' is used. If pymssql is used as the driver there is not a validation issue since time is returned time up to the seconds.