Closed sherifnada closed 3 years ago
hi @sherifnada , any estimation to when this issue will be fixed? still scheduled for 2021-05-14?
@davinchia any input on ETA?
I believe I can have something working by Monday.
@royt-via if it's possible do you mind sharing with me one or two JSONB entries that are causing the issue for you? I want to make sure this nails it. Thanks.
hey @davinchia, I'm trying to full-sync a table with some JSONB columns so I wouldn't know if/which specific JSONB object is causing this error. most of them are pretty standard JSONs with some flat fields. for example
{
"department_id": "c1c3d3f9-958b-45be-a6cf-ed7dc59ec73c",
"message": "Hello"
}
@momer do you by any chance have the offending json object? I want to make sure this fix gets it.
@royt-via @momer I've not been able to reproduce this locally so far.
I've set up a Postgres table with JSONB column and successfully synced to a Snowflake database. These are the following JSON forms I'm currently syncing:
INSERT INTO public.jsonb_table VALUES (1, '{"name": "Paint house", "tags": ["Improvements", "Office"], "finished": true}');
INSERT INTO public.jsonb_table VALUES (2, '{"name": "Wash dishes", "tags": ["Clean", "Kitchen"], "finished": false}');
INSERT INTO public.jsonb_table VALUES (3, '{"name": "Cook lunch", "tags": ["Cook", "Kitchen", "Tacos"], "ingredients": ["Tortillas", "Guacamole"], "finished": false}');
INSERT INTO public.jsonb_table VALUES (4, '{"name": "Vacuum", "tags": ["Clean", "Bedroom", "Office"], "finished": false}');
INSERT INTO public.jsonb_table VALUES (5, '{"name": "Hang paintings", "tags": ["Improvements", "Office"], "finished": false}');
INSERT INTO public.jsonb_table VALUES (6, '{
"streams": [
{
"name": "accounts",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"ownerId": {
"type": "integer"
},
"name": {
"type": "string"
},
"domain": {
"type": "string"
},
"accountId": {
"type": "string"
},
"customProperties": {
"type": "array",
"items": {
"type": "object",
"properties": {
"label": {
"type": "string"
},
"name": {
"type": "string"
},
"value": {},
"type": {
"type": "string"
}
}
}
},
"deleted": {
"type": "boolean"
},
"createDateTime": {
"type": "integer"
},
"updateDateTime": {
"type": "integer"
},
"targeted": {
"type": "boolean"
}
}
}
},
{
"name": "conversations",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"participants": {
"type": "array",
"items": {
"type": "integer"
}
},
"status": {
"type": "string",
"enum": ["open", "closed", "pending", "bulk_sent"]
},
"contactId": {
"type": "integer"
},
"inboxId": {
"type": "integer"
},
"createdAt": {
"type": "integer"
},
"updatedAt": {
"type": "integer"
},
"relatedPlaybookId": {
"type": ["null", "string"]
},
"conversationTags": {
"type": "array",
"items": {
"type": "object",
"properties": {
"color": {
"type": "string",
"description": "HEX value"
},
"name": {
"type": "string"
}
}
}
}
}
}
},
{
"name": "users",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"orgId": {
"type": "integer"
},
"name": {
"type": "string"
},
"alias": {
"type": "string"
},
"email": {
"type": "string"
},
"phone": {
"type": "string"
},
"locale": {
"type": "string"
},
"availability": {
"type": "string"
},
"role": {
"type": "string"
},
"timeZone": {
"type": "string"
},
"avatarUrl": {
"type": "string"
},
"verified": {
"type": "boolean"
},
"bot": {
"type": "boolean"
},
"createdAt": {
"type": "integer"
},
"updatedAt": {
"type": "integer"
}
}
}
}
]
}
');
INSERT INTO public.jsonb_table VALUES (7, '[ {"type": "mobile", "phone": "001001"} , {"type": "fix", "phone": "002002"} ]');
INSERT INTO public.jsonb_table VALUES (8, '[ {"type": "mobile", "phone": "001001"} , {"type": "fix", "phone": "00200\\asd\\qwe\\asdf/asdfblahblah2"} ]');
INSERT INTO public.jsonb_table VALUES (9, '{"type": "mobile", "phone": "001001 "}');
INSERT INTO public.jsonb_table VALUES (10, '{"type": "mobile", "phone": "001001 ", "object": {"type": "mobile", "phone": "001001 "} }');
INSERT INTO public.jsonb_table VALUES (11, '{"type": "mobile", "phone": "\\n\\r \r\n \n \r \\036" }');
INSERT INTO public.jsonb_table VALUES (12, '{"type": "mobile", "phone": "\\n\\r \r\n \n \r \\000a036"}');
There are a variety of JSON types to make sure this works. I've also deliberately tried to insert invalid JSON into the jsonb
table. However, Postgres is rejecting (rightfully so) the invalid JSON (e.g. hidden characters etc).
Before I push any fix, I want to make sure I'm solving the right problem. Since I'm unable to reproduce this locally, can I trouble you to run an image I've prepared that has additional logging around JSON objects and specific exception catching around the points I think this error is happening at? The code I've added can be seen this in this PR. To do so please update your source-postgres
and destination-snowflake
images to use the debug-logging
tag.
Since this logging is verbose, you might need to add disk space to the machines the job is being run on.
@royt-via from the provided logs, it looks like your error is originating from a record at the end of the table. I believe running the job on the last 20,000 records in the table will catch our offender. This might save you some time.
Thanks guys!
@davinchia , sure!
I've just ran it and I'm confused because it succeeded (7.44 GB | 24,003,992 records | 1h 6m 4s | Sync
) and I don't see any tables/records on the destination schema.. I double checked all setting.
Here are the logs of this execution.
Thanks @royt-via! From your logs it does seem like the sync succeeded. All the JSONB stuff looks good, so at least that's working well.
I do see a snowflake error right at the end - perhaps that affected things? So select *
is not showing new tabes?
Snowflake has a panel that allows one to see the queries ran previously. Can you navigate there and see what queries were run? There should be a COPY into a tmp table, an INSERT into the final table, a few normalisation queries, and a DELETE for the tmp table.
Thanks @davinchia! I was able to locate the data somewhere else. Opened another ticket for it.
Sounds good. I'm going to close this for now since it seems that the underlying issue is solved. The work to properly load Postgres JSONB columns will be tracked as part of #1006.
Feel free to reopen if you see this again!
Description
We don't make guarantees about properly encoding binary data in sources or destinations. In the case of #3251, sending a JSONB column from Postgres to Snowflake caused it to break. This is likely happening for other database sources/destinations as well but I didn't verify this.
We should come up with a way for sources which could output binary data to do so in a JSON Schema compliant way (potentially base64 encoding). The design should also allow destinations and normalization to write this data to the destination in a way that makes sense. For example, for destinations which support JSONB we should encode this appropriately. Otherwise we should maintain the base64 string encoding. Ideally this can be encoded into the Abstract JDBC abstraction to reduce wasted work but we'll have to apply judgment.
Logs
An example issue can be found in #3251 . Logs: airbyte_issue_2021-05-05.log
Steps to Reproduce
Severity of the bug for you
Medium