Closed mclate closed 2 years ago
Hi @mclate, could you please check if in your raw data table the page
field is also null for some records? I'd like to understand if it's a normalization-related problem or a replication problem.
Hi @alafanechere .
I work with @mclate so can hopefully answer that.
I just checked but seems there are no results in the raw table that match the _AIRBYTE_AB_ID
field.
I tried
select * from _AIRBYTE_RAW_DATA
where _AIRBYTE_AB_ID = 'c581acde-47ae-4b4a-9cbb-f83618af47b7'
and got 0 results.
when running
select * from data
where _AIRBYTE_AB_ID = 'c581acde-47ae-4b4a-9cbb-f83618af47b7'
I do get a result, although it is one of the broken results.
This discrepancy between raw data and normalized data is not a good sign, do you mind resetting the data for this connection and running a full refresh again?
@alafanechere I shall set that off now, will take a while.
What would you like me to check once it's done?
I'd like to check if you still face the same problem, your raw table should not miss records that are present in normalized tables.
Hi @alafanechere it looks sorted. The strange thing is myself and @mclate did this several times the other day, each time had the same result.
One thing I did differently although don't know if this would effect anything is I reset the data, had a look in snowflake and saw that there was a schema airbyte_{schema-name}
as well as the schema without airbyte in front of it..
I dropped this schema and the newly created one, hit refresh again and saw only the target schema and not the airbyte_
one.
I then did a sync and it seems to have worked.
Environment
0.35.12-alpha
AWS EC2
airbyte/source-mysql 0.4.13
(tried many versions, starting from 0.3.2 up until 0.4.23, didn't try 0.5.x)airbyte/destination-snowflake 0.4.5
Current Behavior
We have been running sync jobs between mysql (AWS RDS) and snowflake for quite some time now. One very strange thing we've noticed is that in the destination dataset, rows for some months have all fields resulting in
null
. To give you brief example, here is the query that was executed in rds and in snowflake after a full sync:select count(distinct d.id) from data d where page is null
. RDS result is0
, while Snowflake gave us442134
(among total of ~25M records).Below I pasted examples of couple rows as they are in RDS and in Snowflake. First two lines are incorrectly imported, while last two are there as an example of correctly imported ones:
RDS:
Snowflake:
The most confusing part is that this happens only for couple months of data - all other months are ok (the data itself is quite consistent in the source)
In Snowflake configuration we were initially using S3 COPY method, however, last full sync was done without it, and it still contains incorrect records
Expected Behavior
All field to be exported correctly
Logs
At this point I'm not sure what logs would be relevant for this issue. Let me know what additional details would be helpful.
cc @mewis