airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
205 stars 30 forks source link

hubspot connection bug #342

Closed jiangsong216 closed 4 days ago

jiangsong216 commented 3 weeks ago

pyairbyte version: 0.16.4

error when using postgresql cache, while there is no problem with default cache:

image

jiangsong216 commented 3 weeks ago

btw, these streams has above issue in my test:

companies contacts companies_property_history contacts_form_submissions contacts_list_memberships deals deals_archived engagements contacts_property_history deals_property_history

aaronsteers commented 1 week ago

It looks like the root cause is that Postgres has a max column name length of 63 characters: https://til.hashrocket.com/posts/8f87c65a0a-postgresqls-max-identifier-length-is-63-bytes

After 63 characters, Postgres is accepting the column name but silently truncating it. Then, on subsequent execution, PyAirbyte thinks the column is missing and so tries to add it, and Postgres truncates it again and says the column name already exists.

@jiangsong216 - Can you confirm that these long column names are coming from the HubSpot system itself or if they are custom columns which you are able to modify/rename?

If there is no path to rename, I think the best we can do is create a custom Normalizer class for Postgres which applies a max length normalization as well as normal uppercase/lowercase normalization.

jiangsong216 commented 1 week ago

@aaronsteers Thanks for your reply. These long column names are coming from HubSpot system itself. btw, you can assign the Postgres Normalizer task to me, I can do this when I have time. Thanks

aaronsteers commented 4 days ago

@jiangsong216 - I have created a PR here:

Should have this ready to merge shortly.