Open Sunoyon opened 10 years ago
Finally 5 spouts of my topology can fetch records concurrently. Basically I had a wrong idea regarding partition key and shards of kinesis stream.
The following link says that, "From a design standpoint, to ensure that all your shards are well utilized, the number of shards (specified by the setShardCount() method of CreateStreamRequest) should be substantially less than the number of unique partition keys, and the amount of data flowing to a single partition key should be substantially less than the capacity of a shard."
So I uploaded data with 30 partition keys, now records are available in 9 shards. So 9 spout can fetch records concurrently. And current rate is quite good (5000 records/sec).
Could please tell me, my idea about partition key and shards is correct ?
Thanks Sunoyon
Hi Sunoyon,
Yes, your approach above of picking partition keys such that lots of them map to the same shard is correct. To your earlier mention about being able to get 10,000 records per second, you'll want to provision the appropriate number of shards. Please see http://docs.aws.amazon.com/kinesis/latest/dev/service-sizes-and-limits.html for information about current Amazon Kinesis sizes and limits.
Please see http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html for information on how to request a limit increase in case you'd like to use more shards than the current default limit.
Sincerely, Gaurav
Hi Gaurav Thanks for your help. Currently I am using a stream of 10 shards. I can always get data using 10 spouts (though I tried with 100 spouts). Your link tells that, "Each shard can support up to 5 read transactions per second up to a maximum total of 2 MB of data read per second." My query is:
Thanks Sunoyon
Hi Sunoyon,
I'd recommend starting a thread on our forum (https://forums.aws.amazon.com/forum.jspa?forumID=169)? We're better able to answer application design questions via the forums.
Thank you, Gaurav
Hi I've used the following configuration. However the throughput is not as expected.
Kinesis side: Number of shards: 5 Around 500,000 rows in kinesis stream. My topology: builder.setSpout("kinesis_spout", spout, 10); BoltDeclarer bolt = builder.setBolt("redis", new KinesisSplitLog(), 50).setNumTasks(200); bolt.shuffleGrouping("kinesis_spout");
and the configuration: conf.setNumAckers(0); conf.setDebug(false); conf.setNumWorkers(5); conf.setMaxSpoutPending(5000);
I used TRIM_HORIZON
Result:
My question is:
Thanks in advance Sunoyon