cgarciae / pypeln

Concurrent data pipelines in Python >>>
https://cgarciae.github.io/pypeln
MIT License
1.55k stars 98 forks source link

[Bug] Map function running multiple times when using Stage object as result generator #109

Open joejonespushsecurity opened 1 year ago

joejonespushsecurity commented 1 year ago

Describe the bug Not sure whether this is a bug or just our misunderstanding of how this is mean to work. When using the map function over a list of objects we expected the map method to only be executed once no matter how many times we iterate over the returned Stage object or how many times we convert it to a list as we do below.

Minimal code to reproduce Small snippet that contains a minimal amount of code.

import pypeln as pl
from datetime import datetime 
def map(count):
     return {count, datetime.now()}

res = pl.task.map(map, [1,2])

print(res)
print(list(res))
print(list(res))

The output we get from the above is

Stage(process_fn=Map(f=<function map at 0x1006ed3a0>), workers=1, maxsize=0, total_sources=1, timeout=0, dependencies=[Stage(process_fn=FromIterable(iterable=[1, 2], maxsize=0), workers=1, maxsize=0, total_sources=1, timeout=0, dependencies=[], on_start=None, on_done=None, f_args=[])], on_start=None, on_done=None, f_args=['count'])
[{1, datetime.datetime(2023, 9, 5, 11, 5, 44, 348977)}, {datetime.datetime(2023, 9, 5, 11, 5, 44, 349063), 2}]
[{1, datetime.datetime(2023, 9, 5, 11, 5, 44, 349627)}, {2, datetime.datetime(2023, 9, 5, 11, 5, 44, 349685)}]

As you can see the datetime objects are being generated each time we build a list from the Stage object. Is that what is meant to happen?

Expected behavior We expected the result to be cached within the Stage object and only run once. So we would expect the same result to be returned no matter how many times we translated the Stage into a list. The same thing happens if we iterate over the Stage object in a for loop.

Library Info Please provide os info and elegy version.

import pypeln
print(pypeln.__version__)

Pypeln version 0.4.9

Screenshots N/A

Additional context N/A