Closed garyelephant closed 6 years ago
指定plugin doc规则, 获取javadoc, 解析javadoc, 生成markdown
https://tomassetti.me/extracting-javadoc-documentation-source-files-using-javaparser/
https://github.com/javaparser/javaparser/issues/325
https://dzone.com/articles/extracting-javadoc-documentation-from-source-files
文档内容:
input 与receiver有关 https://spark.apache.org/docs/latest/streaming-custom-receivers.html
filter与serviceloader有关,见 #41
output
Document 增加内部原理的介绍
A quick Example:
无需任何代码、编译、打包,比官方的Quick Example更简单
配置Waterdrop:
spark {
# Waterdrop defined streaming batch duration in seconds
spark.streaming.batchDuration = 5
# see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
spark.master = "local[2]"
spark.app.name = "Waterdrop-1"
spark.ui.port = 13000
}
input {
socket {}
}
filter {
}
output {
stdout {}
}
启动netcat server用于发送数据:
nc -l -p 9999
for windows: nc64 -l -p 9999
启动Waterdrop 接收程序: sbt "-Dconfig.path=C:\Users\Administrator\Desktop\softwares\waterdrop\config\ConfigExample.conf" "run-main org.interestinglab.waterdrop.WaterdropMain"
在nc端输入:
Hello World
Waterdrop日志打印出:
+-----------+
|raw_message|
+-----------+
|Hello World|
+-----------+
参考:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example
核心数据结构:Event
功能:
特性:
相关概念:field, value,field references
特殊field: raw_message, "root"
实现:SparkSQL Row
表达清楚,除了文档中所列filter插件可以用,所有的Spark UDF也可以在SQL中作为filter使用,能做的事很多!
插件对应文档制作流程: (1)假设你的插件叫Drop,是filter插件,请在 docs/zh-cn/configuration/filter-plugins下面创建Drop.docs (2)根据docs语法规则书写插件文档 (3)执行 PluginDocCommand生成插件文档对应markdown文档 (4)在docs/zh-cn/configuration/_sidebar.md中新增对应链接。 (5)如果想在本地查看生成的文档是否正确,请先安装docsify,然后cd docs, ./start-doc.sh, 访问localhost:3000查看。 (6)git中提交所有变更,merge到master分支后,在线上可以看到文档。 插件对应文档存放位置: docs/zh-cn/configuration/input-plugins docs/zh-cn/configuration/filter-plugins docs/zh-cn/configuration/output-plugins 对应markdown生成方法举例: sbt "run-main org.interestinglab.waterdrop.docutils.PluginDocCommand /Users/yixia/IdeaProjects/waterdrop/docs/zh-cn/configuration/filter-plugins/Drop.docs true"
Waterdrop 与Spark, Logstash 等做对比
描述性能的章节,主要内容是:(1)spark的性能 (2)我们利用的spark的优化点 (3)Waterdrop的性能。
一个配置示例:fake -> split -> stdout, mysql
# fake -> split -> stdout, mysql
spark {
# Waterdrop defined streaming batch duration in seconds
spark.streaming.batchDuration = 5
# see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
spark.master = "local[2]"
spark.app.name = "Waterdrop-1"
spark.ui.port = 13000
// spark.executor.instances = 60
// spark.executor.cores = 2
// spark.executor.memory = "4g"
// spark.streaming.blockInterval= "1000ms"
// spark.streaming.kafka.maxRatePerPartition = 30000
// spark.streaming.kafka.maxRetries = 2
// spark.driver.extraJavaOptions = "-Dconfig.file=/data/slot6/waterdrop/application.conf"
}
input {
fake {
rate = 1
}
}
filter {
split {
fields = ["name", "age"]
delimiter = ","
// target_field = "wrapped"
}
}
output {
stdout {}
mysql {
url = "jdbc:mysql://localhost:3306/data"
user = "root"
password = "123456"
table = "sample_data_table"
}
// textfile {
// save_mode = "ignore"
// serializer = "orc"
// path = "file:///Users/yixia/work/waterdrop-data3"
// }
}
对于 socket的示例,需要waterdrop与官网的socket示例做鲜明的对比
grok插件测试地址:https://grokdebug.herokuapp.com/
中文文档完成度:
配置
[x] (Garyelephant) 通用配置
[x] Input插件
[ ] Filter插件
Geoip[x] Output插件
性能与调优英文文档完成度: