Open aljoscha opened 1 year ago
Just a quick note on priority: I believe the default topic refresh interval here is 5m, so even at the largest cluster replica size (32 threads? 64 threads?) I doubt this will be a problem in production! The way this has showed up to date is in CI, where you can use the unsafe TOPIC METADATA REFRESH INTERVAL MS
option to dial the refresh interval way down. AFAIU that options is meant to be permanently unsafe, because it's a bit in the weeds and has the footgun that it will cause storaged
to fall over if you set it too low for the number of workers in the process.
agreed! just wanted to write it down so we have it somewhere
Currently, each timely worker in the source ingestion pipeline periodically queries the external system to discover new "partitions" (the term is heavily influence by Kafka, but other systems, such as Kinesis shards, have a similar concept and we call them partitions internally). This is problematic when we scale to very large worker counts, as we are essentially DDoS'ing the external system.
Instead, we could add another operator to the ingest pipeline whose sole purpose is to discover new partitions. This operator would only be active on a single timely worker. It would send these partitions on to the
SourceReader
operators for reading.