Open erichare opened 1 year ago
- Is it feasible that a native Python client be developed, or are there any potential show-stopping limitations (i.e., the C++ client can handle some task that Python isn't well-suited for)
Everything is "feasible", though not necessarily the best option.
- Given that it is feasible, is the investment of effort worthwhile? Or is wrapping functionality from the C++ client a better approach at least in the short-medium term?
- Any other thoughts along these lines?
The main question to ask is: which problem would that be solving? What kind of improvement would that be compared to the current client?
That is what would have to be weighted against.
Definitely agree with your point here @merlimat . I don’t have a clear enough answer to that yet, but off hand my feeling would be that it would open up development of the client to a universe of python developers that may have a particularly vested interest in improvements being made.
But yes, the trade off between whether it would lead to any tangible improvements relative to the time investment is definitely questionable, which was sort of why I thought it would be good to begin a dialogue about it.
Appreciate the feedback!
It could be more friendly to Python developers because C++ is a complicated language for most developers that do not have C++ experiences. It's welcome to write another pure Python implemented client, like the native Go client.
However, from my perspective, it's a little different with Golang. Golang compiles its native binaries, while Python interpreter (I mean CPython here) actually works by loading Python scripts or C dynamic libraries . Writing fully-native Python client might bring some unnecessary performance overhead. In addition, a fully-native Python client is only more friendly for Python developers, not Python users.
To echo the above comment, I just raised https://github.com/apache/pulsar-client-python/issues/115 and was going to see if I could open a PR for it myself. Without a native python client, the barrier to entry on contributing is quite high.
On top of this, there's also the value lost for developers when using the library itself. Things like type hints and the internal options are obfuscated.
For a concrete example of this, all the imports from _pulsar
like ConsumerType
have no autocomplete and no python documentation from the code itself. Even the API doc (https://pulsar.apache.org/api/python/2.4.0/) in the subscribe
section simply has a default value of _pulsar.ConsumerType.Exclusive
but no details on what other ConsumerTypes there are.
In fact, to figure out what was available, I ended up searching the C++ client repo for "ConsumerType" and ending up at https://github.com/apache/pulsar-client-cpp/blob/4a864f20530e230ecee7e4ab09430bd65fb9789c/include/pulsar/ConsumerType.h#L28
It has the information I wanted, but in terms of a smooth developer experience, having to manually search C++ code to figure out what I can pass in to my python signature is less than ideal. Happy to be part of the solution here though, and if we migrate to Python I'm very happy to contribute as much as possible.
in the subscribe section simply has a default value of _pulsar.ConsumerType.Exclusive but no details on what other ConsumerTypes there are.
It's an issue with the Python API docs generation for the _pulsar
module, which is a C++ wrapper written based on the PyBind11. (BTW, 2.4.0 is too old) It seems currently the Python docs generation tool does not generate the docs for the _pulsar
module. /cc @tisonkun
On the other hand, PyBind11 support adding the Python API docs via the C++ code. Currently the docs were not good. For example, I typed help(_pulsar.ConsumerType)
and see:
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| Exclusive = <ConsumerType.Exclusive: 0>
|
| Failover = <ConsumerType.Failover: 2>
|
| KeyShared = <ConsumerType.KeyShared: 3>
|
| Shared = <ConsumerType.Shared: 1>
|
| ----------------------------------------------------------------------
| Static methods inherited from pybind11_builtins.pybind11_object:
|
| __new__(*args, **kwargs) from pybind11_builtins.pybind11_type
| Create and return a new object. See help(type) for accurate signature.
We can enhance the API docs later.
@BewareMyPower This can be a known issue tracked at https://github.com/twisted/pydoctor/issues/658 which I don't have time to investigate at that time.
I workaround it with comment in https://github.com/twisted/pydoctor/issues/658. Although, it may require a preinstalled _pulsar
module. Perhaps we can find a better way in API docs automatic generation.
a fully-native Python client is only more friendly for Python developers, not Python users.
I disagree. A lot of issues with python binary extensions in general boil down to them being implemented in non-Python languages. Those issues fall into three major categories:
pip install
. pulsar-client
doesn't do this because it doesn't offer a source wheel, meaning that those users are just left out in the cold. But even if a source wheel were offered, many users (not developers, users) would then struggle with compilation/installation issues. This has been true for very nearly every single binary-extension-based Python project I've ever worked with--they're just never quite as seamless to install for all developers on all platforms as native code. This even extends to binary-based modules with really robust packaging support, like numpy
; the issue rate is just plain higher.pulsar-client
, since the C++ driver at the core of this package resembles an operating system or async runtime more than it resembles a network driver. This results in a very large volume of pretty bad bugs over time, including bugs around garbage collection, GIL deadlocks invoking Python loggers, issues with Python callbacks supplied to e.g. send_async
blocking for longer than expected ... the list goes on. Those aren't code quality issues with either the C++ or the Python code; they're simply issues with the interoperation between two highly complex language runtimes. While fixes have been found for most of them, the rate and severity of those issues really damages my ability to recommend this library to other Python users.python main.py
single-threaded invocation. And those aren't terribly weird contexts; many of those are very common in the python community! And while things like pybind11 can try their best to hide this complexity, a lot of it isn't fully hide-able: consider fork safety, for example; general purpose tools to make threaded network clients fork-safe are pretty rare and limited.@zbentley Your points generally make sense to me. I have some more additional points.
I think point 2 and 3 are still within the scope of the development. Though it makes it hard for Python users to debug issues.
@BewareMyPower perhaps, though issues related to 2 and 3 largely occurred for me while trying to use Pulsar from ordinary python code, rather than trying to develop features on the client itself.
I think part of the issue is that GIL-safe code in this repo is very hard to write; it's easy to be GIL-safe when the handoff between C++ and Python is "unidirectional"--that is, Python code calls into C++, which returns when finished back into Python. But it's much harder when things are bidirectional, and the compiled code can later call back into Python, which the Pulsar client does quite frequently.
I'm rather new to this project but, I thought it might be worthwhile to gauge the community's thoughts on implementing a fully-native Python client. Those of you who have a more intimate understanding of the internals, my questions would be:
Also, if there is a better venue for starting this conversation (i.e., the main pulsar repo, the C++ client repo, or the dev mailing list) let me know and i'll close this and start it there. I just had the thought that maybe those who have worked with the Python client specifically might have a clearer understanding of the pros and cons of going in that direction.