datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

allow generators to pass a schema on the first row #55

Closed OriHoch closed 5 years ago

OriHoch commented 5 years ago

I think this will be useful

from dataflows import Flow, printer

def generator():
  yield {"__dataflows_schema": True,
            "name": "my-resource", 
            "path": "my-resource.csv", 
            "schema": {"fields": [{"name": "i", "type": "string"]}}
  for i in [1,'two', 'three']:
    yield {"i": i}

Flow(generator(), printer()).process()
akariv commented 5 years ago

This could be easily implemented as a wrapper function:

def with_schema(generator, schema):
   def func(package):
       descriptor = dict(name='gen', path='gen.csv', schema=schema)
       package.pkg.add_resource(descriptor)
       yield package.pkg
       yield from package
       yield generator
   return func

(didn't run this, so some tweaking might be necessary)

akariv commented 5 years ago

Fixed via https://github.com/akariv/dataflows-web/commit/bc6db9b3eceb9803f885f7de81c8c19bdba3d10c

akariv commented 5 years ago

http://www.dataflows.org/cookbook.html#patterns.metadata-with-data