Closed IvanBiv closed 4 years ago
I haven't, to be honest
I see that my friend @kkrugler recently gave a talk on the subject so it's definitely worth exploring
https://sf-2018.flink-forward.org/kb_sessions/building-a-scalable-focused-web-crawler-with-flink/
@IvanBiv what would the benefits be? why not go for Apache Beam which is more generic?
@jnioche thanks for link.
Flink better than Storm for me: 1) easy deploy on Docker cluster (Kubernetes). I could not run Storm cluster on Docker Swarm, the searching about this for Kubernetes showed that there is no good solution here either. 2) community
Yep, Apache Beam can be better as middleware between user processing topology and processing work platform.
Julien, you have a lot of work in the form of StormCrawler, it is worth considering the prospects, I mean processing platform.
Thanks @IvanBiv
I don't really see deployment on Docker as a reason to move away from Storm, as for the community, there's nothing wrong with Apache Storm one - certainly not the largest, that's true - but the project is alive and doing well.
Julien, you have a lot of work in the form of StormCrawler, it is worth considering the prospects, I mean processing platform
Sure, but it is also because I invested loads of time in Storm that I won't dump it without very good reasons. There are loads of competing frameworks for stream processing and new ones emerging all the time but as things stand I am happy with Storm. That does not mean that I am not open minded and will never consider anything else though, it's just that I'd need more compelling arguments.
I'd be curious to hear what @kkrugler thinks.
I could not run Storm cluster on Docker Swarm, the searching about this for Kubernetes showed that there is no good solution here either.
Really? There are solutions maintained both by Storm and Kubernetes teams/projects. Wouldn't the time better invested in improving these than porting a crawler? Esp., given that there is already flink-crawler.
I agree with @sebastian-nagel and @jnioche, at the moment there is no good (enough) reason to rewrite everything in Apache Flink. A lot of effort has been put already into creating/maintaining storm-crawler.
Even more, difficulty to deploy this project in a specific environment is not worth the investment of a full rewrite into a different streaming framework IMHO.
I've been following this discussion, and thought I'd chime in with a few thoughts:
To be honest, I do worry a bit about Storm, as I've watched the community of streaming users transition to Spark, Samza, Flink, Heron, Kafka streams and other options over time. One metric I use is tracking activity on the user mailing list for a project - here's that graph for Storm, from the Apache mail archives:
and the same result from Flink:
and also for Spark:
All projects go through a maturity phase where the level of user activity drops off, so I don't think Storm is dead, but I do think in a year it could be time to revisit this discussion.
See Sematext trends on Flink,Storm,Samza (excluded Spark because a lot of it would not be about its streaming capabilities)
https://sematext.com/opensee/report/project/trend?q=Flink,Storm,Samza
not actionable, closing for now. Feel free to reopen if relevant
@jnioche did you think about migrate this SDK to Apache Flink platform? I see Flink more better than Storm. @jnioche what do you think?