datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

head / tail processors #43

Open OriHoch opened 5 years ago

OriHoch commented 5 years ago
def head(num_rows=10):

    def step(rows):
        for rownum, row in enumerate(rows):
            if rownum >= num_rows:
                break
            yield row

    return step

def tail(num_rows=10):

    def step(rows):
        for row in deque(rows, maxlen=num_rows):
            yield row

    return step
akariv commented 5 years ago

Note that for the head you should consume all the rows iterator and not break.

OriHoch commented 5 years ago

I wanted to use the head processor to preview rows from a large resource.. it works fine, maybe add a warning in the docs regarding this processor? not sure..

we could have limit_rows in load processor, but I think it's still useful to have a general head processor

akariv commented 5 years ago

If you have 2 resources and run the current head processor on the first one, the second won't necessarily pass (and you could end up with a deadlock of sorts).