Lazy data structures are hard. We should document strategies for handling with multiple reduces like hubbing and caching.
Hubbing vs caching are different ways to deal with lazy data structures.
It allows you to say what you should happen when something gets consumed multiple times.
Normally it will just re-reduce the entire source. So if the source is a file it would open another file descriptor and read the entire thing.
If you use hub then multiple people can consume that source (file) and it would only use one file descriptor. This is like a splitter or like how multi-pipe works in node streams.
The downside of hub is if someone reduces later he misses the first part of the file.
An alternative is cache. Which will cache the result and if you reduce multiple times it just reads from the cache.
Lazy data structures are hard. We should document strategies for handling with multiple reduces like hubbing and caching.