apache / pulsar-client-cpp

Apache Pulsar C++ client library
https://pulsar.apache.org/
Apache License 2.0
53 stars 62 forks source link

Reduce the Boost dependencies and replace Boost.Asio with Asio #367

Closed BewareMyPower closed 10 months ago

BewareMyPower commented 11 months ago

The Boost headers are huge, taken 1.83 for example:

$ du -sh boost/
178M    boost/

However, the core component we actually need is Boost.Asio or Asio:

$ du -sh boost/asio/
7.0M    boost/asio/

Here is the header list that include a header from Boost:

lib/UtilAllocator.h:22:#include <boost/aligned_storage.hpp>
lib/UtilAllocator.h:23:#include <boost/noncopyable.hpp>
lib/HTTPLookupService.cc:23:#include <boost/property_tree/json_parser.hpp>
lib/HTTPLookupService.cc:24:#include <boost/property_tree/ptree.hpp>
lib/HandlerBase.h:23:#include <boost/asio/deadline_timer.hpp>
lib/UnAckedMessageTrackerEnabled.h:21:#include <boost/asio/deadline_timer.hpp>
lib/Int64SerDes.h:23:#include <boost/asio.hpp>  // for ntohl
lib/ConnectionPool.cc:21:#include <boost/asio/ip/tcp.hpp>
lib/ConnectionPool.cc:22:#include <boost/asio/ssl.hpp>
lib/RoundRobinMessageRouter.cc:21:#include <boost/random/mersenne_twister.hpp>
lib/RoundRobinMessageRouter.cc:22:#include <boost/random/uniform_int_distribution.hpp>
lib/Schema.cc:22:#include <boost/property_tree/json_parser.hpp>
lib/Schema.cc:23:#include <boost/property_tree/ptree.hpp>
lib/AckGroupingTrackerEnabled.h:25:#include <boost/asio/deadline_timer.hpp>
lib/RoundRobinMessageRouter.h:26:#include <boost/date_time/posix_time/posix_time.hpp>
lib/BoostHash.h:24:#include <boost/functional/hash.hpp>
lib/auth/AuthOauth2.cc:21:#include <boost/property_tree/json_parser.hpp>
lib/auth/AuthOauth2.cc:22:#include <boost/property_tree/ptree.hpp>
lib/auth/AuthAthenz.cc:21:#include <boost/property_tree/json_parser.hpp>
lib/auth/AuthAthenz.cc:22:#include <boost/property_tree/ptree.hpp>
lib/auth/AuthToken.cc:21:#include <boost/algorithm/string/predicate.hpp>
lib/auth/AuthBasic.cc:22:#include <boost/archive/iterators/base64_from_binary.hpp>
lib/auth/AuthBasic.cc:23:#include <boost/archive/iterators/transform_width.hpp>
lib/auth/AuthBasic.cc:24:#include <boost/property_tree/json_parser.hpp>
lib/auth/AuthBasic.cc:25:#include <boost/property_tree/ptree.hpp>
lib/auth/athenz/ZTSClient.cc:38:#include <boost/property_tree/json_parser.hpp>
lib/auth/athenz/ZTSClient.cc:39:#include <boost/property_tree/ptree.hpp>
lib/auth/athenz/ZTSClient.cc:47:#include <boost/xpressive/xpressive.hpp>
lib/auth/athenz/ZTSClient.cc:53:#include <boost/archive/iterators/base64_from_binary.hpp>
lib/auth/athenz/ZTSClient.cc:54:#include <boost/archive/iterators/transform_width.hpp>
lib/auth/athenz/ZTSClient.cc:58:#include <boost/regex.hpp>
lib/Url.cc:25:#include <boost/regex.hpp>
lib/OpSendMsg.h:26:#include <boost/date_time/posix_time/ptime.hpp>
lib/NegativeAcksTracker.h:26:#include <boost/asio/deadline_timer.hpp>
lib/MessageIdImpl.h:22:#include <boost/functional/hash.hpp>
lib/Commands.h:28:#include <boost/optional.hpp>
lib/SynchronizedHashMap.h:21:#include <boost/optional.hpp>
lib/TimeUtils.h:24:#include <boost/date_time/posix_time/posix_time.hpp>
lib/Authentication.cc:22:#include <boost/algorithm/string.hpp>
lib/ProducerImpl.cc:23:#include <boost/date_time/posix_time/posix_time.hpp>
lib/Murmur3_32Hash.cc:26:#include <boost/version.hpp>
lib/Murmur3_32Hash.cc:28:#include <boost/predef.h>
lib/Murmur3_32Hash.cc:30:#include <boost/detail/endian.hpp>
lib/UnboundedBlockingQueue.h:22:#include <boost/circular_buffer.hpp>
lib/Backoff.h:23:#include <boost/date_time/posix_time/posix_time.hpp>
lib/Backoff.h:24:#include <boost/random/mersenne_twister.hpp>
lib/ClientConnection.cc:23:#include <boost/optional.hpp>
lib/ProducerImpl.h:23:#include <boost/optional.hpp>
lib/Backoff.cc:24:#include <boost/random/uniform_int_distribution.hpp>
lib/ClientConnection.h:26:#include <boost/any.hpp>
lib/ClientConnection.h:27:#include <boost/asio/bind_executor.hpp>
lib/ClientConnection.h:28:#include <boost/asio/deadline_timer.hpp>
lib/ClientConnection.h:29:#include <boost/asio/io_service.hpp>
lib/ClientConnection.h:30:#include <boost/asio/ip/tcp.hpp>
lib/ClientConnection.h:31:#include <boost/asio/ssl/stream.hpp>
lib/ClientConnection.h:32:#include <boost/asio/strand.hpp>
lib/ClientConnection.h:33:#include <boost/optional.hpp>
lib/MessageAndCallbackBatch.h:26:#include <boost/noncopyable.hpp>
lib/BrokerConsumerStatsImpl.cc:21:#include <boost/date_time/local_time/local_time.hpp>
lib/BatchMessageContainerBase.h:27:#include <boost/noncopyable.hpp>
lib/MessageCrypto.h:30:#include <boost/date_time/posix_time/ptime.hpp>
lib/MessageCrypto.h:31:#include <boost/scoped_array.hpp>
lib/MessageCrypto.cc:22:#include <boost/date_time/posix_time/posix_time.hpp>
lib/SharedBuffer.h:25:#include <boost/asio/buffer.hpp>
lib/SharedBuffer.h:26:#include <boost/asio/detail/socket_ops.hpp>
lib/ConsumerImpl.h:24:#include <boost/optional.hpp>
lib/ClientImpl.cc:47:#include <boost/regex.hpp>
lib/BlockingQueue.h:22:#include <boost/circular_buffer.hpp>
lib/TopicName.cc:21:#include <boost/algorithm/string.hpp>
lib/ProducerConfigurationImpl.h:24:#include <boost/optional.hpp>
lib/checksum/crc32c_sse42.cc:18:#include <boost/version.hpp>
lib/checksum/crc32c_sse42.cc:20:#include <boost/predef.h>
lib/PatternMultiTopicsConsumerImpl.h:31:#include <boost/regex.hpp>
lib/PartitionedProducerImpl.h:23:#include <boost/asio/deadline_timer.hpp>
lib/stats/ProducerStatsImpl.h:26:#include <boost/serialization/array_wrapper.hpp>
lib/stats/ProducerStatsImpl.h:28:#include <boost/accumulators/accumulators.hpp>
lib/stats/ProducerStatsImpl.h:29:#include <boost/accumulators/framework/accumulator_set.hpp>
lib/stats/ProducerStatsImpl.h:30:#include <boost/accumulators/framework/features.hpp>
lib/stats/ProducerStatsImpl.h:31:#include <boost/accumulators/statistics.hpp>
lib/stats/ProducerStatsImpl.h:32:#include <boost/accumulators/statistics/extended_p_square.hpp>
lib/stats/ProducerStatsImpl.h:33:#include <boost/asio/deadline_timer.hpp>
lib/stats/ProducerStatsImpl.h:34:#include <boost/date_time/local_time/local_time.hpp>
lib/stats/ProducerStatsBase.h:25:#include <boost/date_time/posix_time/posix_time.hpp>
lib/stats/ConsumerStatsImpl.h:23:#include <boost/asio/deadline_timer.hpp>
lib/ExecutorService.h:25:#include <boost/asio/deadline_timer.hpp>
lib/ExecutorService.h:26:#include <boost/asio/io_service.hpp>
lib/ExecutorService.h:27:#include <boost/asio/ip/tcp.hpp>
lib/ExecutorService.h:28:#include <boost/asio/ssl.hpp>
lib/Base64Utils.h:21:#include <boost/archive/iterators/base64_from_binary.hpp>
lib/Base64Utils.h:22:#include <boost/archive/iterators/binary_from_base64.hpp>
lib/Base64Utils.h:23:#include <boost/archive/iterators/transform_width.hpp>
lib/BrokerConsumerStatsImpl.h:25:#include <boost/date_time/posix_time/posix_time.hpp>
lib/SimpleLogger.h:24:#include <boost/date_time/posix_time/posix_time.hpp>
lib/SimpleLogger.h:25:#include <boost/format.hpp>
oversearch commented 11 months ago

I personally don't mind a boost dependency on the project. It has a lot of great stuff that you wouldn't want to re-implement yourself, and a large number of C++ projects are already using it anyway. That header list you posted has some nontrivial stuff that won't be in the STL anytime soon or ever.

That said, I will note that ASIO is updated a bit more frequently, and with more serious changes than most Boost libraries. This makes the independent ASIO library more attractive.

At my company, we use Pulsar from two separate C++ code bases, one of which is unfortunately tied down to an old boost version (we're trying to fix that, but it's a huge code base). In order to get Pulsar to build under this and get consistent behavior/performance, I started maintaining a custom fork of Pulsar with boost::asio renamed to just asio::, and we utilize the independent ASIO library in both. It works very well and the changes were pretty simple (you need to fix the "error code" type as well) - a days worth of work.

One thing you could consider doing is just using a preprocessor definition to allow the ASIO namespace to be customized, so users could choose which library they wanted (heavy boost users on a greenfield version will no doubt find that convenient). Just my two cents as a heavy commercial Pulsar user.

BewareMyPower commented 11 months ago

That header list you posted has some nontrivial stuff that won't be in the STL anytime soon or ever.

It makes sense to me. This issue is just an idea when I tried to develop a completely new Pulsar C++ client based on C++20 coroutine. I managed dependencies via Vcpkg, which split Boost to many ports named boost-xxx. Currently this project also maintains the necessary Boost ports: https://github.com/apache/pulsar-client-cpp/blob/27cba3e7d154f97e01911cd5de2cc0d4eaf2ef50/vcpkg.json#L6-L17

However, boost-predef is just introduced for a BOOST_ARCH_X86_64 macro. boost-algorithm is introduced for the string split:

$ find lib -name "*.cc" | xargs grep -n "boost::algorithm"
lib/Authentication.cc:66:        boost::algorithm::split(params, authParamsString, boost::is_any_of(","));
lib/Authentication.cc:69:            boost::algorithm::split(kv, params[i], boost::is_any_of(":"));
lib/TopicName.cc:60:        boost::algorithm::split(pathTokens, topicNameCopy_, boost::algorithm::is_any_of("/"));
lib/TopicName.cc:100:    boost::algorithm::split(pathTokens, topicNameCopy, boost::algorithm::is_any_of("/"));

IMO, introducing a dependency just for a single function is not a good practice. Though for users that already use Boost in their projects it's not an issue.

That said, I will note that ASIO is updated a bit more frequently, and with more serious changes than most Boost libraries. This makes the independent ASIO library more attractive.

Yeah that's the main point I'm concerned about. If we still need to depend on Boost, we can keep both Boost and the independent Asio.

One thing you could consider doing is just using a preprocessor definition to allow the ASIO namespace to be customized

Yeah. It's easy

#ifdef USE_BOOST_ASIO
namespace asio = boost::asio;
using error_code = boost::system::error_code;
#else
using error_code = asio::error_code;
#endif
merlimat commented 11 months ago

Another option is to package a small subset of Boost. Boost has the option for selecting a list of components and creating a smaller distribution with all the components dependencies.

merlimat commented 11 months ago

That header list you posted has some nontrivial stuff that won't be in the STL anytime soon or ever.

@oversearch Yes, we already went through a process of elimination for the boost:thread/smart-pointers/regex/... a couple of years ago :)

BewareMyPower commented 11 months ago

Another reason to replace Boost.Asio with Asio is, to keep the backward compatibility, Boost.Asio could bring many unnecessary dependencies with installing with Vcpkg, which increases the build time significantly. See

My current idea:

BewareMyPower commented 10 months ago

Here is the time taken list when I used vcpkg to install dependencies on Linux.

Elapsed time to handle boost-config:arm64-linux: 2.2 s
Elapsed time to handle boost-static-assert:arm64-linux: 2 s
Elapsed time to handle boost-type-traits:arm64-linux: 2.2 s
Elapsed time to handle boost-preprocessor:arm64-linux: 2.3 s
Elapsed time to handle boost-typeof:arm64-linux: 3.3 s
Elapsed time to handle boost-assert:arm64-linux: 2 s
Elapsed time to handle boost-throw-exception:arm64-linux: 2.1 s
Elapsed time to handle boost-move:arm64-linux: 2.2 s
Elapsed time to handle boost-core:arm64-linux: 2.1 s
Elapsed time to handle boost-smart-ptr:arm64-linux: 2.2 s
Elapsed time to handle vcpkg-cmake:arm64-linux: 18.3 ms
Elapsed time to handle boost-io:arm64-linux: 2.9 s
Elapsed time to handle boost-utility:arm64-linux: 3.2 s
Elapsed time to handle boost-mp11:arm64-linux: 2.1 s
Elapsed time to handle boost-describe:arm64-linux: 2 s
Elapsed time to handle boost-container-hash:arm64-linux: 2 s
Elapsed time to handle boost-type-index:arm64-linux: 2.1 s
Elapsed time to handle boost-predef:arm64-linux: 2.1 s
Elapsed time to handle boost-mpl:arm64-linux: 2.7 s
Elapsed time to handle boost-integer:arm64-linux: 2 s
Elapsed time to handle boost-detail:arm64-linux: 2.1 s
Elapsed time to handle boost-bind:arm64-linux: 2.1 s
Elapsed time to handle boost-variant:arm64-linux: 2 s
Elapsed time to handle boost-tuple:arm64-linux: 1.9 s
Elapsed time to handle boost-unordered:arm64-linux: 2.7 s
Elapsed time to handle boost-winapi:arm64-linux: 2 s
Elapsed time to handle boost-variant2:arm64-linux: 1.9 s
Elapsed time to handle vcpkg-cmake-get-vars:arm64-linux: 27.4 ms
Elapsed time to handle boost-modular-build-helper:arm64-linux: 20.1 ms
Elapsed time to handle boost-build:arm64-linux: 14 s
Elapsed time to handle boost-system:arm64-linux: 2.6 s
Elapsed time to handle boost-optional:arm64-linux: 2.1 s
Elapsed time to handle boost-concept-check:arm64-linux: 1.9 s
Elapsed time to handle boost-regex:arm64-linux: 8.2 s
Elapsed time to handle boost-function-types:arm64-linux: 2 s
Elapsed time to handle boost-function:arm64-linux: 2 s
Elapsed time to handle boost-functional:arm64-linux: 2.2 s
Elapsed time to handle boost-fusion:arm64-linux: 2.9 s
Elapsed time to handle boost-conversion:arm64-linux: 2 s
Elapsed time to handle boost-iterator:arm64-linux: 3.9 s
Elapsed time to handle boost-array:arm64-linux: 2 s
Elapsed time to handle boost-range:arm64-linux: 2.2 s
Elapsed time to handle boost-numeric-conversion:arm64-linux: 2.2 s
Elapsed time to handle boost-intrusive:arm64-linux: 2.2 s
Elapsed time to handle boost-container:arm64-linux: 4 s
Elapsed time to handle boost-lexical-cast:arm64-linux: 2 s
Elapsed time to handle boost-exception:arm64-linux: 2.6 s
Elapsed time to handle boost-tokenizer:arm64-linux: 2.5 s
Elapsed time to handle boost-algorithm:arm64-linux: 3.1 s
Elapsed time to handle boost-date-time:arm64-linux: 2.9 s
Elapsed time to handle boost-rational:arm64-linux: 1.9 s
Elapsed time to handle boost-ratio:arm64-linux: 2.1 s
Elapsed time to handle boost-chrono:arm64-linux: 3.9 s
Elapsed time to handle boost-align:arm64-linux: 2.1 s
Elapsed time to handle boost-atomic:arm64-linux: 3.2 s
Elapsed time to handle boost-thread:arm64-linux: 8 s
Elapsed time to handle boost-proto:arm64-linux: 2.4 s
Elapsed time to handle boost-pool:arm64-linux: 2.4 s
Elapsed time to handle boost-phoenix:arm64-linux: 2.5 s
Elapsed time to handle boost-endian:arm64-linux: 2 s
Elapsed time to handle boost-spirit:arm64-linux: 2.9 s
Elapsed time to handle boost-serialization:arm64-linux: 15 s
Elapsed time to handle boost-logic:arm64-linux: 2.1 s
Elapsed time to handle boost-interval:arm64-linux: 2.2 s
Elapsed time to handle boost-ublas:arm64-linux: 2.4 s
Elapsed time to handle boost-parameter:arm64-linux: 2.2 s
Elapsed time to handle boost-circular-buffer:arm64-linux: 3.2 s
Elapsed time to handle boost-accumulators:arm64-linux: 2.2 s
Elapsed time to handle boost-any:arm64-linux: 2.1 s
Elapsed time to handle boost-context:arm64-linux: 3.2 s
Elapsed time to handle boost-coroutine:arm64-linux: 3.4 s
Elapsed time to handle boost-asio:arm64-linux: 2.7 s
Elapsed time to handle boost-multi-index:arm64-linux: 2.2 s
Elapsed time to handle boost-format:arm64-linux: 2.2 s
Elapsed time to handle boost-property-tree:arm64-linux: 2.2 s
Elapsed time to handle boost-dynamic-bitset:arm64-linux: 2.1 s
Elapsed time to handle boost-random:arm64-linux: 3.8 s
Elapsed time to handle boost-xpressive:arm64-linux: 2.2 s
Elapsed time to handle vcpkg-cmake-config:arm64-linux: 21 ms
Elapsed time to handle openssl:arm64-linux: 51 s
Elapsed time to handle zlib:arm64-linux: 3.5 s
Elapsed time to handle curl:arm64-linux: 18 s
Elapsed time to handle protobuf:arm64-linux: 1.3 min
Elapsed time to handle snappy:arm64-linux: 6 s
Elapsed time to handle zstd:arm64-linux: 11 s
BewareMyPower commented 10 months ago

I decided to close this issue by https://github.com/apache/pulsar-client-cpp/pull/382

Some Boost components are indirectly depended by these two components: boost-accumulators and boost-property-tree, which cannot be removed easily.

I've considered replacing boost-property-tree with JsonCpp. But JSON operations are not important in this library so it won't make a difference.

It has a lot of great stuff that you wouldn't want to re-implement yourself, and a large number of C++ projects are already using it anyway.

And I also agree with this opinion.