airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.53k stars 4k forks source link

Make normalization mapping configurable for custom connectors #7229

Closed Aaron-K-T-Berry closed 7 months ago

Aaron-K-T-Berry commented 2 years ago

Tell us about the problem you're trying to solve

I would like to use the basic normalization feature when using custom connector images. Even if enabled on a custom connector like destination-snowflake the normalization stage files as mapping cannot be found for the custom image name. https://github.com/airbytehq/airbyte/blob/f194f354c7e61868029dc58076c28ac5915297d1/airbyte-workers/src/main/java/io/airbyte/workers/normalization/NormalizationRunnerFactory.java#L18-L30

Describe the solution you’d like

It would be nice if this normalization image mapping would be configurable either through the spec.json or the UI when adding a custom connector.

Describe the alternative you’ve considered or used

You are able to work around the issue currently and use basic normalization with a custom connector image by creating a local docker image tag that is used in the normalization mapping mentioned above.

docker tag custom-snowflake-destination:local airbyte/destination-snowflake:custom-tag

You can then use this image in a custom connector and it should allow for basic normalization to run as expected.

Additional context

The spec.json seems to configure normalization for a custom connector but will always result in a fail due to that normalization map check https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-snowflake/src/main/resources/spec.json#L4

Are you willing to submit a PR?

Unfortunately, my memory of java is not too good

Dracyr commented 2 years ago

:+1: from me as well. But more than for completely custom connectors, we are running our workloads in a locked-down kubernetes, and all image are served by our harbor registry. We can't connect to dockerhub, so what are doing to do is use a image mirror/proxy, which makes the image names change from airbyte/postgres-destination to <harbor_server_name>/<proxy_project_name>/airbyte/postgres-destination.

This also "breaks" the mapping.

cjwooo commented 2 years ago

👍 from me. We have a custom destination that writes records to Postgres using the Hasura GraphQL Mutations API. We want to enable custom DBT transformations against that underlying Postgres and can provide the necessary database credentials to do so, but we are blocked because our destination is not part of the normalization mapping.

cjwooo commented 1 year ago

A lot of the normalization code related to this has been removed or refactored recently. Does this hardcoded mapping still exist in Airbyte versions >= 0.40.26?

marcosmarxm commented 7 months ago

Normalization is deprecated and will be removed from the codebase soon.