logkafka sends log file contents to kafka 0.8 line by line. It treats one line of file as one kafka message.
See FAQ if you wanna deploy it in production environment.
The main differences with flume, fluentd, logstash are
Management of log collecting configs and state:
flume, fluentd, logstash keep log file configs and state locally: start local server for management of log file configs and state.
logkafka keep log file configs and state in zookeeper: watch the node in zookeeper for configs changing, record file position in local position file, and push position info to zookeeper periodically.
Order of log collecting
flume, fluentd, logstash all have INPUT type 'tail', they collecting all files simultaneously, without considering chronological order of log files.
logkafka will collect files chronologically.
librdkafka
libzookeeper_mt
libuv
libpcre2
PHP 5.3 and above (with zookeeper extension)
Two methods, choose accordingly.
Install librdkafka(>0.8.6), libzookeeper_mt, libuv(>v1.6.0), libpcre2(>10.20) manually, then
cmake -H. -B_build -DCMAKE_INSTALL_PREFIX=_install
cd _build
make -j4
make install
Just let cmake handle the dependencies ( cmake version >= 3.0.2 ).
cmake -H. -B_build -DCMAKE_INSTALL_PREFIX=_install \
-DINSTALL_LIBRDKAFKA=ON \
-DINSTALL_LIBZOOKEEPER_MT=ON \
-DINSTALL_LIBUV=ON \
-DINSTALL_LIBPCRE2=ON
cd _build
make -j4
make install
If any of the libs installation fail, please manually install it, and set the corresponding config -DINSTALL_LIBXXX=OFF
.
Note: If you already have kafka and zookeeper installed, you can start from step 2 and replace zk connection string with your own in the following steps, default is 127.0.0.1:2181
.
Deploy Kafka and Zookeeper in local host
tools/grid bootstrap
Start logkafka
Customizing _install/conf/logkafka.conf to your needs
zookeeper.connect = 127.0.0.1:2181
pos.path = ../data/pos.myClusterName
line.max.bytes = 1048576
...
Run in the foreground
_install/bin/logkafka -f _install/conf/logkafka.conf -e _install/conf/easylogging.conf
Or as a daemon
_install/bin/logkafka --daemon -f _install/conf/logkafka.conf -e _install/conf/easylogging.conf
Configs Management
Use UI or command line tools.
3.1 UI (with kafka-manager)
We add logkafka as one kafka-manager extension. You need to install and start kafka-manager, add cluster with logkafka enabled, then you can manage logkafka with the 'Logkafka' menu.
3.2 Command line tools
We use php script (tools/log_config.php) to create/delete/list collecting configurations in zookeeper nodes.
If you do not know how to install php zookeeper module, check this.
How to create configs
Example:
Collect apache access log on host "test.qihoo.net" to kafka brokers with zk connection string "127.0.0.1:2181". The topic is "apache_access_log".
php tools/log_config.php --create \
--zookeeper_connect=127.0.0.1:2181 \
--logkafka_id=test.qihoo.net \
--log_path=/usr/local/apache2/logs/access_log.%Y%m%d \
--topic=apache_access_log
Note:
How to delete configs
php tools/log_config.php --delete \
--zookeeper_connect=127.0.0.1:2181 \
--logkafka_id=test.qihoo.net \
--log_path=/usr/local/apache2/logs/access_log.%Y%m%d
How to list configs and monitor sending progress
php tools/log_config.php --list --zookeeper_connect=127.0.0.1:2181
shows
logkafka_id: test.qihoo.net
log_path: /usr/local/apache2/logs/access_log.%Y%m%d
Array
(
[conf] => Array
(
[logkafka_id] => test.qihoo.net
[log_path] => /usr/local/apache2/logs/access_log.%Y%m%d
[topic] => apache_access_log
[partition] => -1
[key] =>
[required_acks] => 1
[compression_codec] => none
[batchsize] => 1000
[message_timeout_ms] => 0
[follow_last] => 1
[valid] => 1
)
)
More details about configuration management, see php tools/log_config.php --help
.
We test with 2 brokers, 2 partitions
Name | Description |
---|---|
rtt min/avg/max/mdev | 0.478/0.665/1.004/0.139 ms |
message average size | 1000 bytes |
batchsize | 1000 |
required_acks | 1 |
compression_codec | none |
message_timeout_ms | 0 |
peak rates | 20.5 Mb/s |
The most significant third party packages are:
confuse
easylogging
tclap
rapidjson
Thanks to the creators of these packages.
compile with unittest and debug type
cmake -H. -B_build -DCMAKE_INSTALL_PREFIX=_install \
-Dtest=ON \
-DCMAKE_BUILD_TYPE=Debug
cd _build
make
make logkafka_coverage # run unittest
The code that not conform to this rule should be fixed before committing, you can use cpplint
to check the modified files.