apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.06k stars 1.83k forks source link

中文/英文文档 #21

Closed garyelephant closed 6 years ago

garyelephant commented 7 years ago

中文文档完成度:




英文文档完成度:

garyelephant commented 7 years ago

根据plugin javadoc生成markdown文档。

指定plugin doc规则, 获取javadoc, 解析javadoc, 生成markdown

https://tomassetti.me/extracting-javadoc-documentation-source-files-using-javaparser/

https://github.com/javaparser/javaparser/issues/325

https://dzone.com/articles/extracting-javadoc-documentation-from-source-files

https://github.com/antlr/grammars-v4/tree/master/javadoc

garyelephant commented 7 years ago

文档内容:

插件开发指导

garyelephant commented 7 years ago

Document 增加内部原理的介绍

garyelephant commented 7 years ago
garyelephant commented 7 years ago

A quick Example:

无需任何代码、编译、打包,比官方的Quick Example更简单

配置Waterdrop:

spark {
  # Waterdrop defined streaming batch duration in seconds
  spark.streaming.batchDuration = 5

  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.master = "local[2]"
  spark.app.name = "Waterdrop-1"
  spark.ui.port = 13000
}

input {
  socket {}
}
filter {
}

output {
  stdout {}
}

启动netcat server用于发送数据:

nc -l -p 9999

for windows: nc64 -l -p 9999

启动Waterdrop 接收程序: sbt "-Dconfig.path=C:\Users\Administrator\Desktop\softwares\waterdrop\config\ConfigExample.conf" "run-main org.interestinglab.waterdrop.WaterdropMain"

在nc端输入:

Hello World

Waterdrop日志打印出:

+-----------+
|raw_message|
+-----------+
|Hello World|
+-----------+

参考:

https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example

garyelephant commented 7 years ago

核心数据结构:Event

功能:

特性:

相关概念:field, value,field references

特殊field: raw_message, "root"

实现:SparkSQL Row

garyelephant commented 7 years ago

表达清楚,除了文档中所列filter插件可以用,所有的Spark UDF也可以在SQL中作为filter使用,能做的事很多!

garyelephant commented 7 years ago

插件对应文档制作流程: (1)假设你的插件叫Drop,是filter插件,请在 docs/zh-cn/configuration/filter-plugins下面创建Drop.docs (2)根据docs语法规则书写插件文档 (3)执行 PluginDocCommand生成插件文档对应markdown文档 (4)在docs/zh-cn/configuration/_sidebar.md中新增对应链接。 (5)如果想在本地查看生成的文档是否正确,请先安装docsify,然后cd docs, ./start-doc.sh, 访问localhost:3000查看。 (6)git中提交所有变更,merge到master分支后,在线上可以看到文档。 插件对应文档存放位置: docs/zh-cn/configuration/input-plugins docs/zh-cn/configuration/filter-plugins docs/zh-cn/configuration/output-plugins 对应markdown生成方法举例: sbt "run-main org.interestinglab.waterdrop.docutils.PluginDocCommand /Users/yixia/IdeaProjects/waterdrop/docs/zh-cn/configuration/filter-plugins/Drop.docs true"

garyelephant commented 7 years ago

Waterdrop 与Spark, Logstash 等做对比

garyelephant commented 7 years ago

描述性能的章节,主要内容是:(1)spark的性能 (2)我们利用的spark的优化点 (3)Waterdrop的性能。

garyelephant commented 7 years ago

一个配置示例:fake -> split -> stdout, mysql

# fake -> split -> stdout, mysql

spark {
  # Waterdrop defined streaming batch duration in seconds
  spark.streaming.batchDuration = 5

  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.master = "local[2]"
  spark.app.name = "Waterdrop-1"
  spark.ui.port = 13000

//  spark.executor.instances = 60
//  spark.executor.cores = 2
//  spark.executor.memory = "4g"
//  spark.streaming.blockInterval= "1000ms"
//  spark.streaming.kafka.maxRatePerPartition = 30000
//  spark.streaming.kafka.maxRetries = 2
//  spark.driver.extraJavaOptions = "-Dconfig.file=/data/slot6/waterdrop/application.conf"
}

input {
  fake {
    rate = 1
  }
}
filter {
  split {
    fields = ["name", "age"]
    delimiter = ","
//    target_field = "wrapped"
  }
}

output {
  stdout {}

  mysql {
    url = "jdbc:mysql://localhost:3306/data"
    user = "root"
    password = "123456"
    table = "sample_data_table"
  }
//  textfile {
//    save_mode = "ignore"
//    serializer = "orc"
//    path = "file:///Users/yixia/work/waterdrop-data3"
//  }
}
garyelephant commented 6 years ago

对于 socket的示例,需要waterdrop与官网的socket示例做鲜明的对比

garyelephant commented 6 years ago

grok插件测试地址:https://grokdebug.herokuapp.com/