Resize broadway pipeline

dashbitco / broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

https://elixir-broadway.org

Apache License 2.0

2.42k stars 160 forks source link

Resize broadway pipeline #228

Open fcevado opened 3 years ago

fcevado commented 3 years ago

Discussing on broadway_rabbitmq @josevalim commented about this possibility.

I was thinking that restarting everything could be avoided by changing BatchProcessorSupervisor and ProcessorSupervisor to a :simple_one_for_one supervisor(or a DynamicSupervisor), but I guess it would require broadway callbacks to be stateful. At least it's what I'm getting from broadway architecture docs

The idea is that if any process fails, we want to restart the rest of the tree. Since Broadway callbacks are stateless, we can handle errors and provide reports without crashing processes.

josevalim commented 3 years ago

DynamicSupervisor is not needed. You can add children to any supervisor while they are running. It won't be as fast as a dynamic one - but this is not like an operation you would do every second.

The biggest issue with adding this feature is actually making sure the new entries supervise to their producers and their children supervise to them - and how to handle failures if anything goes wrong in this process.

There is another approach which is to restart the rest of the tree except the producer. This will be necessary whenever there is a PartitionDispatcher, since the number of partitions are specified upfront.

fcevado commented 3 years ago

You can add children to any supervisor while they are running.

I was thinking of a scenario where you want to scale down. A :one_for_all supervisor would always require to restart the entire tree when doing that, right?

josevalim commented 3 years ago

@fcevado no, you can still stop children in a regular supervisor without restarting. The restart is always about unexpected failures.

guigaoliveira commented 3 years ago

I'm not sure I understand correctly, but changing broadway configuration dynamically can be interesting to optimize pipeline according to metrics and then have best possible typology while it's running. Or change values according to specific scenarios (usually based on some metric), not necessarily optimizing, but increasing / decreasing parameters. If it is possible not only change concurrency of processors, but also other parameters such as batch_size, batch_timeout, etc.

PranavRam commented 3 years ago

I'm not sure I understand correctly, but changing broadway configuration dynamically can be interesting to optimize pipeline according to metrics and then have best possible typology while it's running. Or change values according to specific scenarios (usually based on some metric), not necessarily optimizing, but increasing / decreasing parameters. If it is possible not only change concurrency of processors, but also other parameters such as batch_size, batch_timeout, etc.

This would be really useful. Are you thinking something as simple as updating the config (producer + processor concurrency and min_demand, max_demand, etc) either sync or async?

josevalim commented 3 years ago

The only way to do that at the moment would be by restarting the pipeline. This is mostly OK except for Kafka which doesn't cope well with constant reconections.

fcevado commented 3 years ago

@josevalim I was thinking, just increasing the amount of childs of the ProcessorSupervisor and restarting the BatchProcessorSupervisor wouldn't do it? Given the genstage subscription structure, just the BatchProcessor needs to know about all the Processors, is that right? I guess it's not needed to mess up with the the producer section of the supervision tree.

josevalim commented 3 years ago

Not quite. You still need to go through the rest of the pipeline and tell them to subscribe to the new processors. Sizing down has its own challenges as you need to subscribe, drain, and then terminate. To be clear, it is all doable, but it is a sizeable amount of work and testing.