dpc / pariter

Parallel iterator processing library for Rust
https://docs.rs/dpc-pariter
101 stars 3 forks source link

Questions and cheers #10

Open v1gnesh opened 1 year ago

v1gnesh commented 1 year ago

Hello,

Thank you for sharing your excellent work, and also for writing about it in your blog. Thoroughly enjoyed reading the two posts about pariter.

I have a few questions/requests:

1 - When you have time, can you write about the profile module you've built into pariter? How it works, how to use it, etc.

2 - When using .chunks(10) in your example, I guess this is like increasing the width of the pipe. Does this mean that the lookup_orders function is now doing parallel iteration within itself too? If so, in which threadpool?

3 - Say there's an iter like the lines() iter, on which I use parallel_map(). Could you show how the output can be sinked into a file at the end of the pipeline? Would it be better to do it like this - .chunks(100).for_each(|line| writefn(line))

dpc commented 1 year ago

Hi. I finally cleared out all my github notififications. Sorry for late response, but such is my reality. :D

  1. I don't think I will get to it anytime soon. https://docs.rs/pariter/latest/pariter/profile/struct.TotalTimeProfiler.html is the best thing. It a nut shell: it allows you to track how long a given iterator pipeline step waits for elements from a previous step ("ingress"), or for a next step to take over the result ("egress"). This way it's possible to tell which step is the bottleneck and could use some optimization.
  2. Yes, chunks effectively will increase the width of the pipeline (by making each item contain multile actual items). I don't understand the lookup_orders reference.
  3. You can do it any way you'd do it with a normal iterators. I would probably just do for line in ... { write!(file, "{line}")?; } or something like that.