Open dalejin2014 opened 8 years ago
@dalejin2014 We'd love to have native stream processing libraries in different languages and having really good Kafka clients is the basis for that. That said, we don't have a timeline for adding this yet.
@dalejin2014: As @ewencp mentioned we don't have a timeline yet. The reason for this is that we first want to ensure we have a strong foundation in the form of the Java implementation of Kafka Streams before venturing into non-JVM languages.
That said, of course I took a note of your request. :-)
Do you mind sharing some information about your use case where you'd use Kafka Streams from Python?
We are interested in developing a commenting feature kind of like google doc. The use case is as follows:
So we are thinking about using Kafka Streaming since it provides us:
Is there an easy way to port the features from Java client?
Thanks for sharing the background info @dalejin2014.
Is there an easy way to port the features from Java client?
It's not super-hard but also not trivial. Also, one would need to continuously maintain any such Kafka Streams libraries for other languages with the same commitment and high quality as the current Kafka Streams library for Java, so "porting" is not a one-off effort but an ongoing time investment. Hence our current decision to focus our efforts first on the Java implementation of Kafka Streams.
+100 :)
Kafka Streams for Python would be so amazing. I'm currently evaluating stream processing frameworks and I like what I've been reading about Kafka Streams. My use case is essentially this: I'm laying down the infrastructure to enable realtime analytics and processing of log/event data. The primary users of this data are data scientists who would be standing up their own Kafka streams apps mostly for doing transformations, joins, partitioning and windowed analytics. I think Kafka streams fits this use case nicely since the streams library eliminates a lot of the boiler plate code involved in configuring Kafka consumers and producers but leaves developers the freedom and flexibility to do lots of cool stuff with the data in each Kafka topic. The only catch is that not many of the data scientists are well versed in Java--our language of choice is Python for almost everything. As much as I like Kafka and as excited as I am about Kafka Streams, getting the data scientists on board with writing Java will be an uphill battle.
With that said, have there been any developments with regards to supporting a Python based Kafka Streams library?
@zzbennett I hear you, Elizabeth. :-)
Unfortunately our short-term roadmap does not include work on a Python library of Kafka Streams. (We'd definitely welcome contributors though!) Same situation for e.g. kafka-python, a community project.
I'm kinda hesitant to suggest this, but perhaps it would be worth a try to experiment with Jython? IIRC some Ruby users have been experimenting with Kafka Streams' Java library via JRuby. FWIW, there are a few community/external projects already working on various "wrappers" (in a broad sense) for Kafka's Streams and Connect APIs, but they haven't been released yet; I don't remember off the top of my hat whether a Python-based one was amongst that.
Thanks for your reply @miguno and thanks for the suggestions. Jython might be a good option for prototyping. I may actually be able to drum up support for Scala based Streams apps, which would work a bit better with the Java libraries.
As far as contributing, I may even end up putting together a Python port of Kafka Streams for our uses cases. Eventually with the help of some collaborators in the kafka python community we'd hopefully be able to contribute something upstream. But I suppose we can cross that bridge when we get there. At any rate, thanks again for the help!
@zzbennett Somebody in my group was talking about working on this also. If you create a repo with issues laying out the work and then solicit help, you may find yourself with some contributors reasonably soon.
@murphyke that would be super. I actually just created a repo last weekend to start working on it (https://github.com/python-kafka-streams/python-kafka-streams). I haven't committed any work or created any tickets yet, but hopefully I'll get a chance to do that in the next couple of days. Feel free to send people over there if they are itching to work on it. Once a little momentum gets built up I'll post to some user groups to solicit help.
@zzbennett I'd love to contribute to the python-kafka-streams repo.
I would love to work on this, as well as love the idea itself :)
Wondering if someone has some initial design which I can start working with?
so... what's best practice? use Jython?
Jython is one option, yes. And some users are actually running Jython-based Kafka Streams applications in production.
Also: There's an upcoming, community-driven Python implementation of Kafka Streams (a first MVP = not all features are already implemented) that will be presented at EuroPython later this month.
The code @miguno is referring to is now on GitHub: https://github.com/wintoncode/winton-kafka-streams
Check it out and get involved with the project!
no updates for a month on winton, I hope they continue their good project
seems dead unfortunately
Would be great to have a bit of help from Confluent on this, given python is the most wanted language in 2017 according to Stack Overflow
@pouledodue: I'd suggest to bring this up at https://github.com/wintoncode/winton-kafka-streams -- the last commit in that project was actually 5 days ago.
+1 on this. Question for the community about renaming the projet to a more "standard name": https://github.com/wintoncode/winton-kafka-streams/issues/8
at this point I decided to learn the java ecosystem instead of using an half-baked python solution
Are there any developments on this request ? I was so excited about kafka but with no streaming api implementation in python I am unsure now.
@g-rd, as of today we are still tracking interest but it doesn't currently have a place on the roadmap.
@g-rd you may look into Apache Pulsar
@edenhill I have looked at it already, but it looks to me that this project is either perfect with no developing needed or just not being developed. I go with not being actively developed. I am looking now at Apache Pulsar and I think Pulsar is a better fit for me.
Check out a Kafka Streams inspired Python Stream Processing library we just open sourced: https://robinhood.engineering/faust-stream-processing-for-python-a66d3a51212d
It's been over a year. Any further comment on if Kafka Stream will be available?
We do not have any immediate plans to create a non-java Kafka Streams implementation. Either look into using KSQL or https://github.com/wintoncode/winton-kafka-streams
There is Faust a python library developed by Robinhood that focuses on event processing and stream processing from a source such as a Kafka topic https://github.com/robinhood/faust
There seems to be viable alternatives to an officially supported implementation.
There seems to be viable alternatives to an officially supported implementation.
This is true. However, with non-officially supported APIs there is always the risk that they will stop being maintained. The last commit in the popular winton-kafka-streams was 1.5 years ago.
We do not have any immediate plans to create a non-java Kafka Streams implementation.
Has there been any change in this regard? @edenhill
There is Faust a python library developed by Robinhood that focuses on event processing and stream processing from a source such as a Kafka topic https://github.com/robinhood/faust
Sasl authentication (which you will certainly use with a confluent kafka cluster) is broken since 1.9. It has been 4 months now trying to have at least a comment from a programmer in robinhood, without any success.
Why not use Python on GraalVM? It's getting better :)
For almost a year, I am playing with the idea to develop a functional abstraction which allows to use the Kafka Client API including Kafka Streams from Python/JS/R via GraalVM. Then you wouldn't be dependent on separate solutions like Faust which most probably will not be able to always keep up with the latest developments (and will probably not offer optimal performance and feature-richness).
BTW if anybody would like to join me to start developing this abstraction layer on top of the Java-based Kafka Streams API which enables the use of it via GraalVM in Python/JS/R/C etc. - I'd be happy :)
Any news on adding Kafka-Streams library?
There is Faust a python library developed by Robinhood that focuses on event processing and stream processing from a source such as a Kafka topic https://github.com/robinhood/faust
Unfortunately the project looks to be unmaintained for months now :(
It's alive - check out the fork: https://github.com/faust-streaming/faust
There is Faust a python library developed by Robinhood that focuses on event processing and stream processing from a source such as a Kafka topic https://github.com/robinhood/faust
Unfortunately the project looks to be unmaintained for months now :(
It's alive - check out the fork: https://github.com/faust-streaming/faust
Yeah I saw the fork that's being updated which is nice to see, I meant the official project isn't being maintained anymore by the folks at Robinhood. My worry is the same will happen to that fork (or any future forks of Faust for that matter) as now it's a huge project being maintained only by a few people from the open-source community.
Has kafka streams for Python made it onto the roadmap yet ?
We have no plans to implement Kafka Streams for Python.
This is a big shame. My team had recently started discussing the possibility of migrating from our microservices architecture to an event-sourced one. We've all been pretty excited about the idea as we have a lot of microservices and analysis services that work together - bringing all the data together could simplify a lot. But we're a Python shop and the more I've been looking into the libraries available, the less appealing this idea has become. We've floated the idea of adopting Java just for this reason, so we could make use of Kafka Streaming, but so much of our system would have to be translated to Java to make this feasible. It just doesn't seem worth it.
You could look into nats.io, I have moved to use that in my projects and I have really started to like it. It has all the features I needed, including tables, dedup, key value store, etc. also it's very light weight and simple to use. Great alternative for stream processing with python.
On Thu, Sep 15, 2022, 16:12 nathan-audette @.***> wrote:
This is a big shame. My team had recently started discussing the possibility of migrating from our microservices architecture to an event-sourced one. We've all been pretty excited about the idea as we have a lot of microservices and analysis services that work together - bringing all the data together could simplify a lot. But we're a Python shop and the more I've been looking into the libraries available, the less appealing this idea has become. We've floated the idea of adopting Java just for this reason, so we could make use of Kafka Streaming, but so much of our system would have to be translated to Java to make this feasible. It just doesn't seem worth it.
— Reply to this email directly, view it on GitHub https://github.com/confluentinc/confluent-kafka-python/issues/38#issuecomment-1248083427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFCAZNVCPV3ZAUPKMPX33T3V6MODRANCNFSM4CN67YXQ . You are receiving this because you were mentioned.Message ID: @.***>
We are interested in using kafka streaming. Is it on the road map for confluent kafka python library?