Open-NET-Libraries / Open.ChannelExtensions

A set of extensions for optimizing/simplifying System.Threading.Channels usage.
https://open-net-libraries.github.io/Open.ChannelExtensions/api/Open.ChannelExtensions.Extensions.html#methods
MIT License
416 stars 25 forks source link

Multi channel multi transformation - how to do it using ChannelExtensions #24

Closed askids closed 1 year ago

askids commented 2 years ago

hi,

I didn't see any discussion options. Hence opening it as an issue. Primarily looking for suggestion on how to do this using this ChannelExtensions.

Currently I am using TPL with multiple bufferblock and the code was written almost 7 years back. So now I am trying to convert this to using channels and I cam across this library which seemed to simplify the task. But I am not very clear on how to use it as I have need multiple intermediate channels to process the whole pipeline.

  1. I have a datasource that I initially split into ranges. Range could be 1K, 2K, 5K etc depending on config.
  2. Then for each range, I extract the data which could run into 10K - 100K.
  3. I then want to push the above extract into smaller chunks (say 500 or 1000 or 2000) into another channel.
  4. These smaller chunks can be picked in batch and then be transformed before being written to final channel
  5. From the final channel, I may want to either write to a file or another target table or make another API call to push the data into different system.

The main thing that I cannot figure out is that 1 entry on 1st channel can generate 10K-100K records for 2nd channel. So when I use PipeAsync, how do I code for it as it seems to assume 1 input on source channel transformed to 1 output item. I dont want to write 100K rows as a single item on 2nd channel. So primary question is how do I achieve this? Any suggestions would be very helpful.

I was referring to examples from this repo and below link. https://blog.maartenballiauw.be/post/2020/08/26/producer-consumer-pipelines-with-system-threading-channels.html

Thanks!

electricessence commented 2 years ago

Hi. Sorry for the late response. I'm a big fan of DataFlow if that's what you were using. It has some feature advantages that Channels don't innately have.

I have yet to actually have a scenario like yours so this is very interesting. You may have break from the extensions at some point and do some manual wiring. So if I get you right. You might have 3 channels. Each with a slightly different behavior? Where something is 'routing' to them based on their size?

electricessence commented 2 years ago

Keep in mind that if you are using DataFlow for the same thing, there's really nothing wrong with that. You might gain a small memory benefit from ValueTasks, but overall, DataFlow is really smart and does tend to have a smooth performance profile.

electricessence commented 1 year ago

Will reopen if more detail provided.