datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

Adding foreign keys #83

Closed anuveyatsu closed 5 years ago

anuveyatsu commented 5 years ago

It could be just like set_primary_key processor, however, adding foreign keys is probably less common. At the moment, the only option I can see is to use update_resource processor by providing the entire schema of a resource. Is there a way to get generated schema so that I could just add a new key into it (e.g., foreignKeys)?

anuveyatsu commented 5 years ago

After a bit of research, this can be easily done:

...

def add_foreign_keys(package):
    package.pkg.descriptor['resources'][0]['schema']['foreignKeys'] = [
        {
            'fields': 'Timestamp',
            'reference': {
                'resource': 'time',
                'fields': 'Timestamp'
            }
        }
    ]
    # Must yield the modified datapackage
    yield package.pkg
    # And its resources
    yield from package

Flow(
    load(...),
    add_foreign_keys,
    ...
).process()