airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

Error in generated sql query: syntax error at or near "as" #3418

Open Aleksandr-yask opened 3 years ago

Aleksandr-yask commented 3 years ago

Expected Behavior

Correct data transfer

Current Behavior

When i try to transfer data from mysql to postgresql I got error: syntax error at or near "as"

Logs

2021-05-14 14:17:47 INFO (/tmp/workspace/17/0) LineGobbler(voidCall):69 -   syntax error at or near "as"
2021-05-14 14:17:47 INFO (/tmp/workspace/17/0) LineGobbler(voidCall):69 -   LINE 30:  as 
2021-05-14 14:17:47 INFO (/tmp/workspace/17/0) LineGobbler(voidCall):69 -             ^
2021-05-14 14:17:47 INFO (/tmp/workspace/17/0) LineGobbler(voidCall):69 -   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/bireport_db/_airbyte_raw_customergroup_cc6.sql
2021-05-14 14:17:47 INFO (/tmp/workspace/17/0) LineGobbler(voidCall):69 - 

Generated query in _airbyte_raw_customergroup_cc6.sql file:


with __dbt__CTE___airbyte_raw_customergroup_ab1_ee6 as (

-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema
select
    _airbyte_emitted_at
from "postgres".bireport_db._airbyte_raw__airbyte_raw_customergroup
-- _airbyte_raw_customergroup
),  __dbt__CTE___airbyte_raw_customergroup_ab2_ee6 as (

-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type
select
    _airbyte_emitted_at
from __dbt__CTE___airbyte_raw_customergroup_ab1_ee6
-- _airbyte_raw_customergroup
),  __dbt__CTE___airbyte_raw_customergroup_ab3_ee6 as (

-- SQL model to build a hash column based on the values of this record
select
    *,
    md5(cast(

 as 
    varchar
)) as _airbyte__airbyte_raw_customergroup_hashid
from __dbt__CTE___airbyte_raw_customergroup_ab2_ee6
-- _airbyte_raw_customergroup
)-- Final base SQL model
select
    _airbyte_emitted_at,
    _airbyte__airbyte_raw_customergroup_hashid
from __dbt__CTE___airbyte_raw_customergroup_ab3_ee6
-- _airbyte_raw_customergroup from "postgres".bireport_db._airbyte_raw__airbyte_raw_customergroup

as you can see field is not specified in md5(cast()). This problem occurs on multiple tables

Steps to Reproduce

My mysql table:

-- auto-generated definition
create table CustomerGroup
(
    id        bigint unsigned not null
        primary key,
    name      varchar(255)    not null,
    createdAt datetime        not null,
    updatedAt datetime        not null,
    constraint name_idx
        unique (name)
);

Severity of the bug for you

High

Airbyte Version

0.22.3-alpha

Connector Version (if applicable)

Postgres: 0.3.2 Mysql: 0.3.0

┆Issue is synchronized with this Asana task by Unito

ChristopheDuong commented 3 years ago

Can you please share the content of your destination_catalog.json?

   docker run -it --rm -v airbyte_workspace:/data busybox find /data/17/0/normalize/destination_catalog.json -type f -exec cat "{}" \;

as described here: https://docs.airbyte.io/tutorials/browsing-output-logs#reading-the-content-of-the-catalog-json-file

Aleksandr-yask commented 3 years ago

Can you please share the content of your destination_catalog.json?

   docker run -it --rm -v airbyte_workspace:/data busybox find /data/17/0/normalize/destination_catalog.json -type f -exec cat "{}" \;

as described here: https://docs.airbyte.io/tutorials/browsing-output-logs#reading-the-content-of-the-catalog-json-file

I'm not sure what I have the right to share the full structure of the database, so i will only send the structure of one table

{
            "stream":{
                "name":"CustomerGroup",
                "json_schema":{
                    "type":"object",
                    "properties":{
                        "id":{
                            "type":"number"
                        },
                        "name":{
                            "type":"string"
                        },
                        "createdAt":{
                            "type":"string"
                        },
                        "updatedAt":{
                            "type":"string"
                        }
                    }
                },
                "supported_sync_modes":[
                    "full_refresh",
                    "incremental"
                ],
                "default_cursor_field":[

                ],
                "source_defined_primary_key":[
                    [
                        "id"
                    ]
                ],
                "namespace":"bireport_db"
            },
            "sync_mode":"full_refresh",
            "cursor_field":[

            ],
            "destination_sync_mode":"append",
            "primary_key":[
                [
                    "id"
                ]
            ]
        },
...
    {
            "stream":{
                "name":"_airbyte_raw_CustomerGroup",
                "json_schema":{
                    "type":"object",
                    "properties":{
                        "_airbyte_data":{
                            "type":"string"
                        },
                        "_airbyte_ab_id":{
                            "type":"string"
                        },
                        "_airbyte_emitted_at":{
                            "type":"string"
                        }
                    }
                },
                "supported_sync_modes":[
                    "full_refresh",
                    "incremental"
                ],
                "default_cursor_field":[

                ],
                "source_defined_primary_key":[
                    [
                        "_airbyte_ab_id"
                    ]
                ],
                "namespace":"bireport_db"
            },
            "sync_mode":"full_refresh",
            "cursor_field":[

            ],
            "destination_sync_mode":"append",
            "primary_key":[
                [
                    "_airbyte_ab_id"
                ]
            ]
        },

If you need something else, please let me know.

ChristopheDuong commented 3 years ago

One of your stream is _airbyte_raw_CustomerGroup why did you select that table for replication too ? Shouldn't you select only CustomerGroup?

Aleksandr-yask commented 3 years ago

I have never created such a table. Perhaps it was created through airbyte and I selected all the source tables. But I'm sure that in some launches I selected only the CustomerGroup table and another one (related to this one)

ChristopheDuong commented 3 years ago

I'll close this issue since the generated SQL failing to execute is actually failing on replicating the internal airbyte table _airbyte_raw_CustomerGroup (created by airbyte destination connectors before normalization can transform them)

These internal (destination) tables should not be selected for replications as source streams.

If you still face any issue while replicating "the CustomerGroup table and another one (related to this one)", you can open a new issue while attaching the corresponding logs/infos.

Thanks!

Aleksandr-yask commented 3 years ago

I cannot agree with the closure of the task. The program shouldn't break if I add such tables. Especially if this table was created by the program (I did not choose this destination base). But that's up to you.

michel-tricot commented 3 years ago

@ChristopheDuong shouldn't we do something cleaner here?

ChristopheDuong commented 3 years ago

I cannot agree with the closure of the task. The program shouldn't break if I add such tables. Especially if this table was created by the program (I did not choose this destination base). But that's up to you.

I was closing it to break it up as follow:

  1. In this other PR: https://github.com/airbytehq/airbyte/pull/3470 the exception should be handled and won't cause normalization to fail anymore. However, the _airbyte_raw stream will simply be ignored for the moment.

  2. So, in the meantime, I opened a new issue: https://github.com/airbytehq/airbyte-internal-issues/issues/91 to handle properly the chaining of replication to two destinations.

With either options, this should avoid getting into the same exceptions that you've encountered here!

In the meantime, a quick way around it, as I was replying earlier, is to avoid using the raw tables as a source with the airbyte version you currently have...