aws-samples / amazon-mwaa-examples

Amazon Managed Workflows for Apache Airflow (MWAA) Examples repository contains example DAGs, requirements.txt, plugins, and CloudFormation templates focused on Amazon MWAA.
MIT No Attribution
106 stars 60 forks source link

start-stop-mwaa-environment - mwaa_import_data.py - variable.csv - fails for field larger than field limit #74

Open mvitale-kensu opened 6 months ago

mvitale-kensu commented 6 months ago

Hello guys,

In our case the resume step fails because the mwaa_import_data dag fails while importing variable.csv This is the error:

[2024-05-16, 08:01:16 UTC] {{taskinstance.py:1937}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/operators/python.py", line 192, in execute
    return_value = self.execute_callable()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/operators/python.py", line 209, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dags/mwaa_import_data.py", line 146, in importVariable
    for row in reader:
_csv.Error: field larger than field limit (131072)

Just FYI I've fixed it by adding these few lines of code to mwaa_import_data.py:

import sys
import csv
maxInt = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.

    try:
        csv.field_size_limit(maxInt)
        break
    except OverflowError:
        maxInt = int(maxInt/10)

Coming from this: https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

I am not sure if this is the correct way to manage this, but for what I've seen it seems to be working fine for us.

We are on airflow 2.7.2 and I am using the latest code of this project available in main.

crupakheti commented 6 months ago

Thank you @mvitale-kensu for reporting this issue. Instead of the solution that you proposed, did another solution in the SO post not work for you:

import sys
import csv

csv.field_size_limit(sys.maxsize)
...

In any case, we do need to include a fix for this issue. Just curious if you tried the above solution?

mvitale-kensu commented 6 months ago

Hey @crupakheti, Yep, with the solution proposed in my first message the issue was fixed for us.