airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.05k stars 4.11k forks source link

Source Oracle: DATE type: parse to date and/or time instead of String #20924

Open naxels opened 1 year ago

naxels commented 1 year ago

Tell us about the problem you're trying to solve

I'm trying to get data from Oracle into PostgreSQL, which works fine, except that all DATE type's are converted to String, immediately making it much harder on the PostgreSQL side to work with date/times.

Describe the solution you’d like

I see in the documentation, https://docs.airbyte.com/integrations/sources/oracle#data-type-mapping that Oracle date, timestamp etc is converted to string, but no clear explanation as to why that is happening?

Ideally, these source types would be converted to correct date and/or time (or timestamp) columns on the destination side.

Describe the alternative you’ve considered or used

None, I'm considering doing manual normalisation but that would just complicate things.

Are you willing to submit a PR?

If I know where to look / what to reason about, I might be able to assist.

ggam commented 1 year ago

I have this same problem.

~It seems like the code resides on https://github.com/airbytehq/airbyte/blob/5d6867d1d66e325f4a19aa9d7c1c39d30b15603e/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py#L524-L584~

~I may try to have a look at implement it but I have no Python experience so I would need help writing new unit tests for it.~

The affected code is on the Oracle source connector.

Harm01 commented 1 year ago

@naxels @ggam Did you ever found a solution to this problem? I have the same challenge here.

I have little experience in writing python code. But if there is anything I can do to assist, Let me know.

naxels commented 1 year ago

Unfortunately i have not, but am still very interested in having one!

ggam commented 1 year ago

Well... my workaround was to use a custom dbt transformation after Airbyte basic normalization. So it basically stores the JSON raw data, transforms to the normalized form and then again I transform it to the final table.

It's not performance wise efficient, but gives me complete flexibility to replicate the exact data mapping.

On Sat, Oct 7, 2023 at 12:46 PM Harm @.***> wrote:

@naxels https://github.com/naxels @ggam https://github.com/ggam Did you ever found a solution to this problem? I have the same challenge here.

I have little experience in writing python code. But if there is anything I can do to assist, Let me know.

— Reply to this email directly, view it on GitHub https://github.com/airbytehq/airbyte/issues/20924#issuecomment-1751679363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQC44AEI6ATMJZ54GYL5XDX6EXIFAVCNFSM6AAAAAATMHTAWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGY3TSMZWGM . You are receiving this because you were mentioned.Message ID: @.***>