Closed spark-louissalin closed 3 years ago
Not sure if this is the same bug, but I'm seeing a segfault when running pulsar-client 2.5 with Python 3.7.7 on Mac OS 10.13.6. This is the stack trace I get:
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 _pulsar.cpython-37m-darwin.so 0x0000000104e2fd9d unsigned long boost::asio::detail::kqueue_reactor::cancel_timer<boost::asio::time_traits<boost::posix_time::ptime> >(boost::asio::detail::timer_queue<boost::asio::time_traits<boost::posix_time::ptime> >&, boost::asio::detail::timer_queue<boost::asio::time_traits<boost::posix_time::ptime> >::per_timer_data&, unsigned long) + 29
1 _pulsar.cpython-37m-darwin.so 0x0000000104e2fc64 boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::cancel(boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::implementation_type&, boost::system::error_code&) + 52
2 _pulsar.cpython-37m-darwin.so 0x0000000104e2d7d4 boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::executor>::cancel() + 68
3 _pulsar.cpython-37m-darwin.so 0x0000000104f84e6b pulsar::ProducerStatsImpl::~ProducerStatsImpl() + 59
4 _pulsar.cpython-37m-darwin.so 0x0000000104f3986d pulsar::ProducerImpl::~ProducerImpl() + 1021
5 _pulsar.cpython-37m-darwin.so 0x0000000104de4acc boost::python::objects::value_holder<pulsar::Producer>::~value_holder() + 60
6 _pulsar.cpython-37m-darwin.so 0x0000000104fe1274 boost::python::objects::instance_dealloc(_object*) + 52
I am also affected by this (pulsar client 2.8.0.post0
, broker 2.8.0 standalone, Python 3.9.6 on MacOS 10.11). The following script reliably reproduces the segfault:
import pulsar
def get_producer():
cl = pulsar.Client(
'pulsar://localhost:6650',
operation_timeout_seconds=10,
)
return cl.create_producer(
'foobar',
)
producer = get_producer()
producer.send(
b'test_payload',
)
@BewareMyPower I suggest updating this issue's description to reflect affected platforms, e.g. Pulsar client segfaults Python if any Producer instance outlives the garbage collection of any Client instance
or similar.
This problems stems by the fact that in C++ lib, you need to keep the Client
instance alive. The Producer
only keeps a weak_ptr
to the Client
to prevent a cyclic dependency.
In the boost::python wrapping, there's no easy way to keep a reference on the C++ Client
instance. We'd need to create a custom wrapper class instead of automated wrapping done at https://github.com/apache/pulsar/blob/f784379323bdbd60db2825f3e6e63730d290a8f9/pulsar-client-cpp/python/src/producer.cc#L84
@merlimat I suggest solving it in Python: have producer objects, when created, store a Python-side reference to the Client that created them. Python's cyclic garbage collector should be able to appropriately handle that cleanup, in python 3 at least. The cyclic GC works "all right", not perfectly, in Python 2, but is much improved in 3. I think this can be done without changes in the native code.
EDIT this might be what you're proposing above, in which case I'm sorry for the duplication.
That's not quite idiomatic, but the typical Python conventions for this kind of thing (use contextmanagers for producers/consumers) would require a larger refactor to the driver and would break back compatibility.
Though I guess that raises a larger question: for consumers of the C++ API, it might be desirable in some cases to provide a producer object without providing the ability to create additional producers/consumers. If that's the case, it may make sense to alter the C++ code to use shared_ptr
and allow producers to keep clients alive, allowing cases where a client is referenced only inside a producer object and not anywhere else. Since I don't code to the C++ API, I am not sure whether that's reasonable.
Though I guess that raises a larger question: for consumers of the C++ API, it might be desirable in some cases to provide a producer object without providing the ability to create additional producers/consumers. If that's the case, it may make sense to alter the C++ code to use
shared_ptr
and allow producers to keep clients alive, allowing cases where a client is referenced only inside a producer object and not anywhere else. Since I don't code to the C++ API, I am not sure whether that's reasonable.
Typically it's not a big deal in the sense that you're expected to keep 1 single instance of C++ Pulsar client for the duration of your application. You can still pass on producers instances to sub-components of that application.
@merlimat I suggest solving it in Python: have producer objects, when created, store a Python-side reference to the Client that created them. Python's cyclic garbage collector should be able to appropriately handle that cleanup, in python 3 at least. The cyclic GC works "all right", not perfectly, in Python 2, but is much improved in 3. I think this can be done without changes in the native code.
EDIT this might be what you're proposing above, in which case I'm sorry for the duplication.
I was actually thinking in fixing that in the C++/Python wrapper, but your suggestion is much easier to implement!
The cyclic GC works "all right", not perfectly, in Python 2, but is much improved in 3. I think this can be done without changes in the native code.
I wouldn't even need to be a cyclic dependency. There are 2 layers of wrappers:
pure-python object --> boost-python generated wrapper --> C++ class
We just need to retain a reference to the client boost-python object from the pure-python producer object.
Describe the bug Repeatedly opening and closing a client and producer will eventually cause a segmentation fault 11 and the python interpreter will crash.
To Reproduce Within a python 3.7.6 interpreter and using the pulsar-client 2.5 pip library, repeat the following until a seg fault happens:
For example:
Expected behavior I should be able to run these 3 lines forever without trouble
Desktop (please complete the following information):