apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.44k stars 14.11k forks source link

data to long when using s3 to gcs #39708

Closed Lee2532 closed 4 months ago

Lee2532 commented 4 months ago

Description

If a large amount of s3 data is moved using s3 to gcs, data too long for column error occurs when storing related values in the last xcom

Use case/motivation

I think we can prevent this error by adding a logic that compresses the list to show only some of the data

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

raphaelauv commented 4 months ago

XCOM is not for data , it's for metadata ( xcom backend is just a work around , in all case Airlfow should not see your data )

Airflow is a scheduling tool not an ETL tool

Airlfow operators are most of them simple python helpers not data-transfert efficient tools