Prefetch Size - Githubissues

atheriel / longears

The RabbitMQ client for R

https://atheriel.github.io/longears/

36 stars 9 forks source link

Prefetch Size #6

Closed PvanHengel closed 3 years ago

PvanHengel commented 4 years ago

Hi - Being that R is single threaded by default, and often we use queues to farm out batch work or long running work, i think it makes a lot of sense to allow the prefetch size to be changed from 50, which seems very high, especially as the unit of work grows. Totally understand the desire to keep many of the details hidden from the lower level api, which is great, but there should be a way to do this as an optional flag.

https://github.com/atheriel/longears/blob/268398f5936bf4ab4e64612a8bf9f3111ec6a7da/src/connection.h#L11

atheriel commented 4 years ago

Just to clarify, are you asking that it be possible to change the default (e.g. with a compile-time -DDEFAULT_PREFETCH_COUNT=1 flag), or that it be possible to change the count on a per-consumer basis?

Also, I'm very interested to hear what kind of workload you're running that this is causing issues for.

atheriel commented 4 years ago

To provide some additional context: this package uses 50 as the default because that's the number I've seen used in other clients.

My general understanding of RabbitMQ best practice at present is that (1) you never want to use the default unbounded prefetch count/size unless your consumers are stupidly fast; (2) for very consistent workloads you can carefully optimize throughput by picking "just-right" prefetch size and count values; otherwise, (3) you probably want something close to 50 or exactly 1 for true round-robin.

Using R makes (1) irrelevant and (2) suspect, so I'm guessing if you want something different than 50 you want exactly 1.

hampos commented 3 years ago

@atheriel Hey there and great job on the library.

Is it possible for exactly 1 support on a per-consumer basis? This is what makes the most sense for long-running work because we don't want a process to consume messages only to have them wait since it's already busy, and it removes the ability to scale by spawning more processes on demand when messages are piling up.