Closed BuzzCutNorman closed 1 year ago
I created a reproduction script to create a small test table to test against. here is the script I used.
use [testdata]
go
/*********************************
Create the simple test table
in an MSSQL database
*********************************/
DROP TABLE IF EXISTS [MoneyTrouble];
CREATE TABLE MoneyTrouble (
Id int IDENTITY(1,1) PRIMARY KEY,
Item varchar(255) NOT NULL,
Realmt MONEY,
);
DROP TABLE IF EXISTS [SmallMoneyTrouble];
CREATE TABLE SmallMoneyTrouble (
Id int IDENTITY(1,1) PRIMARY KEY,
Item varchar(255) NOT NULL,
Realmt MONEY,
);
go
/*********************************
Insert some test data into the
test table
*********************************/
INSERT INTO [testdata].[dbo].[MoneyTrouble]
([Item]
,[Realmt])
VALUES
('Park Ticket', 500.00)
,('Churro', 20.00)
,('Lunch', 134.48)
,('Character Signature', NULL)
;
INSERT INTO [testdata].[dbo].[SmallMoneyTrouble]
([Item]
,[Realmt])
VALUES
('Water', 5.89)
,('Sticker', 2.35)
,('Candy', .75)
,('Taking a break', NULL)
;
go
/********************************
Query the new test tables
********************************/
select * from MoneyTrouble;
select * from SmallMoneyTrouble;
After looking at this I found that the function singer-sdk.typing.to_jsonschema_type()
is correctly choosing a jasonscheme data type of string
as a default. Unfortunately, the money amount does not have quotes around it in the output messages since sqlalchmeny sees it as a numeric
.
{"Id": 1, "Item": "Park Ticket", "Realmt": 500.0000}
.
I agree with @cwegener suggestion via slack to type this as a number
JSON type. I found that number
doesn't have a concept of scale. This means anything ending in zero may get stripped an example would be 1.00
or 1.10
will become 1.0
and 1.1
. I also saw an instance in which a target stripped off all the decimals and rounded up so 0.75
became 1
.
Despite my findings during testing, I still like MONEY
and SMALLMONEY
turning into a JSON number
since this matches up with how SQLAlchemy deals with MONEY
and SMALLMONEY
data types.
Would a fixed scale using "multipleOf": 0.01
(or "multipleOf": 0.0001
) work? That's what Rob is doing as well. https://github.com/wintersrd/pipelinewise-tap-mssql/blob/e61e8687a552c8a6c67946b1c90f44745da90cb9/tests/test_tap_mssql.py#L96-L99
On a larger scale, a general decimal precision functionality for number types could probably be provided by the SDK.
On second thought, I don't think multipleOf
actually solves for the specific problem you described.
Would a fixed scale using "multipleOf": 0.01 (or "multipleOf": 0.0001) work? That's what Rob is doing as well.
I came across that in a StackOverflow example and gave it a try. The test amount of 134.48
kept blowing it up with a not a multiple of 0.0001
vaidation error. It looked like the same error Harshit mention they had gotten when trying the wintersrd tap-mssql
so I abandoned adding it.
Would a fixed scale using "multipleOf": 0.01 (or "multipleOf": 0.0001) work? That's what Rob is doing as well.
I came across that in a StackOverflow example and gave it a try. The test amount of
134.48
kept blowing it up with a not a multiple of0.0001
vaidation error. It looked like the same error Harshit mention they had gotten when trying thewintersrd tap-mssql
so I abandoned adding it.
Yup. I think the more appropriate topic is this one: https://github.com/json-schema-org/json-schema-vocabularies/issues/45
I completely glanced over the fact that multipleOf
only has the purpose of checking that the data already has the right scale, which is not the problem you had.
On second thought, I don't think
multipleOf
actually solves for the specific problem you described.
We lose some things in the process of going from SQL -> Python -> JSON -> Python -> SQL . In this instance we lose the precision and scale information. I think I traced the target side back to this line in the singer-sdk.typing.to_sql_type()
function.
if _jsonschema_type_check(jsonschema_type, ("number",)):
return cast(sqlalchemy.types.TypeEngine, sqlalchemy.types.DECIMAL())
If we had the precision and scale in the discovery schema, we could add it back in.
Hi Team, We are trying to explore meltano as EL solution, As our first EL we are trying to dump data from MSSQL to S3-Parquet replication, tried both
FULL_TALBLE
andINCREMENTAL
but getting same error in both scenarios as below. Log level isDEBUG
and we are unable to identity the root cause. Can you please help and let us know if we are doing anything wrong.--variant buzzcutnorman
column type is money in Db