frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
695 stars 144 forks source link

maximum recursion depth exceeded for field_update method when applying transformation #1621

Open zac215 opened 7 months ago

zac215 commented 7 months ago

Overview

I use some code to change field_names with regex but when the pipeline contains the steps for more than 87 fields I get this error I don't understand:

frictionless.exception.FrictionlessException: [step-error] Step is not valid: "field_update" raises "maximum recursion depth exceeded in instancecheck"

Here is my code:

from re import search 
from os import remove
from frictionless import  Resource, steps, formats, Schema, Pipeline,transform
source = Resource(path="reponse_3.csv")
schema = Schema.describe("reponse_3.csv")
field_names=schema.field_names
step_list=[]
def remove_number(field_name):
    match = search(r'\d+\.\s(.+)', field_name)
    if match:
        return match.group(1)  
    else:
        return field_name  
for field_name in field_names:
    step_list.append(steps.field_update(name=field_name, descriptor={"name":remove_number(field_name)}))
pipeline = Pipeline(steps=step_list)
target= source.transform(pipeline)
target.write('reponse_3.csv', control=formats.CsvControl(delimiter=';'), encoding='utf-8') 

Thanks

ebsumanta commented 1 month ago

I am also facing the same situation and things are getting difficult a production senerio.

zac215 commented 1 month ago

I am also facing the same situation and things are getting difficult a production senerio.

I finaly got the expected result with no transformation methods and no pipeline. I just infered schema, edited it, and then rewrite the data according to the new schema. Here is the code:

from frictionless import Resource, formats
def remove_number(field_name):
    match = search(r'\d+\.\s(.+)', field_name)
    if match:
        return match.group(1)  
    else:
        return field_name  
source = Resource(path="reponse_3.csv")
source.infer()
schema= source.schema
for field_name in schema.field_names:
    schema.update_field(name=field_name, descriptor={ 'name':remove_number(field_name)})
source.schema=schema
source.write('reponse_4b.csv', control=formats.CsvControl(delimiter=';'), encoding='utf-8')