airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.72k stars 3.78k forks source link

Source Zendesk Support: Negative IDs / integers for Zendesk Support connection #16367

Open marcosmarxm opened 1 year ago

marcosmarxm commented 1 year ago

This Github issue is synchronized with Zendesk:

Ticket ID: #2071 Priority: normal Group: Community Assistance Engineer Assignee: Sajarin

Original ticket description:

  • Is this your first time deploying Airbyte? No
  • OS Version / Instance: Ubuntu / AWS EC2
  • Memory / Disk: m5.2xlarge - 8 CPU, 32 GB Memory
  • Deployment: Docker
  • Airbyte Version: 0.39.14-alpha
  • Source name/version: airbyte/source-zendesk-support:0.2.15
  • Destination name/version: airbyte/destination-s3:0.3.14
  • Step:
  1. Create a connection between Zendesk Support and S3, in a parquet format.
  2. Query the data (in that case, using Athena). It will be possible to see IDs as negative floats when it should be integers. Also, the numbers don’t match the original ID on Zendesk. Example with Zendesk Ticket Comments table:
    image
  • Description: Some additional comments;
    • This doesn’t happen to other sources, apparently only with Zendesk.
    • My assumption is that Zendesk provides some big integers for some columns. When the code converts data to a parquet, it “changes” the values.
[Discourse post]
marcosmarxm commented 1 year ago

Comment made from Zendesk by Sajarin on 2022-09-06 at 17:11:

Hi @israel.mendes, I think your assumption is correct and if that's the case, we can most likely have a community contribution to fix the error. 
 
I escalated the issue to github here: https://github.com/airbytehq/airbyte/issues/16367. Please follow there for updates. 
marcosmarxm commented 1 year ago

Comment made from Zendesk by Marcos Marx on 2022-09-06 at 21:32:

I will follow up on Github then, thanks @sajarin!

[Discourse post]
vitormd commented 1 year ago

Any updates on this issue? The same is happening to our Google Ads ad's ids

hemanthadawale commented 1 year ago

Hi @sajarin and @marcosmarxm , I am facing exact same issue with MySQL Source. The destination is same as [israel.mendes's] which is S3 in parquet format. The bigint columns in MySql tables are converted to incorrect negative values in parquet S3. Can you help me here or guide to any thread?

shrodingers commented 1 year ago

Issue is INT typing on the ids of the zendesk connection, leading to INT overflow on destinations which have a INT typing on 32bits. so any number > MAXINT (2147483647) is being truncated to fit into 32 bits, thus even rewriting the MSB leading to negative numbers being shown. Solution would be to change zendesk id type to string to allow correct arbitrary length numbers. Would gladly submit a pr for that if needed. In the meantime, some trick that works is to recover the correct id using url field and regexp extraction