dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.38k stars 155 forks source link

implement include and exclude filters in Schema as data item transformations #64

Open rudolfix opened 2 years ago

rudolfix commented 2 years ago

row filtering can be done in item transform and be added optionally to any resource. implementation in Schema is slow and only solves problems it has created :)

here's a GPT-4 prompt that writes correct function

Write me a function in python that takes a nested dictionary as input. the dictionary can contain dictionaries, lists and basic types as values. the keys are string. the function takes two more arguments: a list of exclude regexes and a list of include regexes. the regex is matching paths in the dictionary. the paths are similar to json path but a separator is __ if given element has a path matching exclude regex it is removed from dictionary. however if any of the nested (child) elements of that element matches include path it should stay but other elements should be removed

testing: normalize tests are using various advanced modes of filtering. they must pass with new function

amentee commented 10 months ago

@rudolfix I am a complete beginner in python but want to give a try to implement this function. Can you guide me which file I need to refer . I am assuming this - https://github.com/dlt-hub/dlt/blob/master/tests/common/schema/test_filtering.py . But please correct me and which function shall I work on in that file to implement the requirement