airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.29k stars 4.15k forks source link

Snowflake Destination: need support for multi-byte identifiers #18058

Open amrael opened 2 years ago

amrael commented 2 years ago

Tell us about the problem you're trying to solve

I'm trying to create a connection with the Snowflake destination connector with “Destination Namespace” set to “Custom format”, and “Namespace Custom Format” set to something like “airbyte.io コーポレートサイトGA” which contains multi-byte characters and generates a schema named exactly the same as the custom format. Instead, I get a schema name defined as “AIRBYTE_IO__GA” and there would be only _AIRBYTE_RAW_ tables inside. Every Snowflake identifier should be enclosed by double quotes as described in Identifier Requirements — Snowflake Documentation to support multi-byte characters for identifiers.

Describe the solution you’d like

The value of Namespace Custom Format should be enclosed by double quotes.

Describe the alternative you’ve considered or used

I tried some variations such as "airbyte.io コーポレートサイトGA", \"airbyte.io コーポレートサイトGA\", \\"airbyte.io コーポレートサイトGA\\", ""airbyte.io コーポレートサイトGA"", but to no avail. They all ended up with extra underscores. We allow our users to name the Snowflake schema via Airbyte API as they wish, so it's inevitable to handle multi-byte characters, Japanese in our case.

Additional context

I already opened a ticket, https://discuss.airbyte.io/t/snowflake-destination-connector-does-not-support-multi-bytes-identifiers/2923 , but @natalyjazzviolin kindly redirected me here.

Are you willing to submit a PR?

No

natalyjazzviolin commented 2 years ago

@amrael triaged! I'll inquire where this could fall on the roadmap :)

amrael commented 2 years ago

Hi @natalyjazzviolin I'm just checking in, but is there any update on this? Thanks.

amrael commented 2 years ago

Is there any update on this? This issue is critical for us since the outcome of the connection with the multi-byte name is unpredictable and unable to determine the relationship between the connection and the Snowflake schema. And also the normalization seems not working. So this issue is not a feature request, but a bug.

harshithmullapudi commented 2 years ago

Hey requested the team take a look, they should get back soon.

amrael commented 2 years ago

I just opened the PR, and working fine on my end. For now, I'm using a custom connector derived from the snowflake destination connector. Still, it'd be great if you could merge it into the official repository so that I don't need to apply the change to newer versions.

amrael commented 2 years ago

@harshithmullapudi @natalyjazzviolin Can you help me to facilitate the PR to be merged? I'm having a problem with the normalization in combination with my custom connector image derived from the Snowflake destination, as I posted in the link below. https://discuss.airbyte.io/t/cant-normalize-with-a-custom-destination-connector-based-on-snowflake/3133