BuzzCutNorman / tap-mssql

Singer Tap for MS SQL built with Meltano Singer SDK.
MIT License
2 stars 9 forks source link

feat: Support for `xml` data type #34

Closed BuzzCutNorman closed 1 year ago

BuzzCutNorman commented 1 year ago

It's the data type that stores XML data. You can store xml instances in a column, or a variable of xml type.

https://learn.microsoft.com/en-us/sql/t-sql/xml/xml-transact-sql?view=sql-server-ver16 https://docs.sqlalchemy.org/en/14/dialects/mssql.html#sqlalchemy.dialects.mssql.XML

BuzzCutNorman commented 1 year ago

This is the test table I came up with. I used two lines from the StackOverflow tags.xml file to load into TestColumn1.

use [testdata]
go
/*********************************
Create the simple test table
in an MSSQL database
*********************************/
DROP TABLE IF EXISTS [TestTable];
CREATE TABLE TestTable (
    Id int IDENTITY(1,1) PRIMARY KEY,
    TestColumn1  XML
);
go
/*********************************
Insert some test data into the
test table
*********************************/
DECLARE @testXML as XML= 
'<?xml version="1.0" encoding="utf-8"?>
<tags>
  <row Id="1" TagName=".net" Count="319407" ExcerptPostId="3624959" WikiPostId="3607476" />
  <row Id="2" TagName="html" Count="1130469" ExcerptPostId="3673183" WikiPostId="3673182" />
</tags>'

INSERT INTO [testdata].[dbo].[TestTable]
           ([TestColumn1])
     VALUES
           (@testXML)
;
BuzzCutNorman commented 1 year ago

After looking at the JSON Schema documentation I thought I would use the keywords string and contentMediaType to define an XML data type. On the target side this definition will be translated to XML. The documentation mentions using application/xml over text/xml.

https://json-schema.org/understanding-json-schema/reference/non_json_data.html#contentmediatype

{
    "type": ["string"],
    "contentMediaType": "application/xml"
}
BuzzCutNorman commented 1 year ago

While trying to add support for XML I found a couple of things that would need to be added to the SDK to allow for the JSON Scheme definition I chose.

In singer_sdk._singerlib.schema the Class itself needs to have the following variable added to the Schema class. Then the STANDARD_KEYS needs a corresponding entry added as well. Once these are added the keyword contentMediaType will be present in the schema message for the target to consume.

STANDARD_KEYS = [
...
    "contentMediaType",
...
]

@dataclass
class Schema:
...
    contentMediaType: str | None = None

    def to_dict(self) -> dict[str, t.Any]: