datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

How would I add an "id" column #79

Closed anuveyatsu closed 5 years ago

anuveyatsu commented 5 years ago

Any suggestion for getting from this

a,b,c
a1,b1,c1
a2,b2,c2

to this:

id,a,b,c
1,a1,b1,c1
2,a2,b2,c2
zelima commented 5 years ago

@anuveyatsu how about this

from dataflows import Flow, printer, set_type, add_computed_field
data = [{'a': 'a%s' % (x+1), 'b': 'b%s' % (x+1), 'c':'c%s' % (x+1)} for x in range(10)]

def add_id(rows):
    counter = 0
    for row in rows:
        row['id'] = counter + 1
        yield row
        counter += 1

Flow(
  data,
  add_computed_field([dict(target='id', operation='constant', with_='dummy')]),
  add_id,
  set_type('id', type='integer'),
  printer()
).process()

Output:

res_1:
  #  a           b           c                    id
     (string)    (string)    (string)      (integer)
---  ----------  ----------  ----------  -----------
  1  a1          b1          c1                    1
  2  a2          b2          c2                    2
  3  a3          b3          c3                    3
  4  a4          b4          c4                    4
  5  a5          b5          c5                    5
  6  a6          b6          c6                    6
  7  a7          b7          c7                    7
  8  a8          b8          c8                    8
  9  a9          b9          c9                    9
 10  a10         b10         c10                  10
anuveyatsu commented 5 years ago

@zelima perfect! Thanks!