Closed sorabh89 closed 9 years ago
I did not understand the question completely . You mean to say you want to consume Kafka stream but not to process using Spark ? Then you can use the kafka api directly for your purpose .
No, I want to use Spark to consume Kafka stream but I have to use same Kafka stream for multiple projects.
That shouldn't be a problem, as long as your other projects are using a different consumer group.
That is the thing I don't want to use different consumer groups. I want a single consumer that consumes the stream and then duplicates it or anything else. And then I want each of my application to use 1 copy of that stream.
There are few disadvantages doing this . If your different applications consumes from same stream , it will be difficult for you to handle failures, replay , re-process etc . Let sat applications A,B,C process from same channel. Now let say you have modified your business logic in App B and want to replay whole stream . The A and C will again process the same messages.
Let say some failure happened in C and you want to start from offset X again . Your B and A would get again reprocess same messages.
What is the constraint to use individual stream ? Resources ? This receiver can now even run on Single Core for given topic and you can control the parallelism for different stream. if you specified settings that way..Let say Application A need more parallelism , you give more receivers , but B does not need more Receiver ..
Thanks Dibbhatt,
Actually my requirement is to use only the last half an hour's data from stream and most important the data for all the applications should be exactly same. So even if I find an error in application A and I change the offset I want the same change to take place for B and C. That is the reason why I want to use single stream of data and then create 3 stream from it with same data in all 3 streams for application A, B and C.
I'm new to spark, So I'm not sure if I can achieve this using Spark. Please let me know if Spark can help me out of it.
Thanks & Regards,
Well, in same stream , for each RDD generated , you apply different logic for A,B,C on same RDD..
In that case only problem is that if I do some changes for let say application A, I will have to redeploy it that will also interrupt execution of application B and C. I am looking for a solution in which I replicate the stream in let say application A and then the copies of the same stream can be used in application B and C.
You want to launch different StreamingContext from different driver program and still want to consume from same stream ? Not sure if that is possible. I do not see what is the issue having three different DStream via three ReceiverLauncher in three driver application. As all three will consume from same Kafka topic, you process SAME data and you will get your multi tenancy kind of feature that one stream won't impact other
Thanks Dibbhatt,
I also tried to find other options for the some but it seems the most convenient option to do this is to use different streams.
Thanks for your explanation.
Hi,
I have to use a Kafka stream but for different purposes, so I don't want to use different Kafka consumers for it. Is there a way to achieve it.
Is spark cluster going to help me for this.
Please help, Thanks,