apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.01k stars 1.82k forks source link

M1 TODO #38

Closed garyelephant closed 6 years ago

garyelephant commented 7 years ago

[在这个节点上线]


garyelephant commented 7 years ago

细节方面的TODO:

garyelephant commented 7 years ago

BaseInput的getDStream 返回类型不是通用的,预计在实现input插件时将遇到问题。

abstract class BaseInput(config: Config) extends Plugin {

  /**
   * No matter what kind of Input it is, all you have to do is create a DStream to be used latter
   * */
  def getDStream: DStream[(String, String)]

  /**
   * Things to do after filter and before output
   * */
  def beforeOutput: Unit = {}

  /**
   * Things to do after output, such as update offset
   * */
  def afterOutput: Unit = {}

}
garyelephant commented 7 years ago

Spark Benchmark: https://github.com/intel-hadoop/HiBench

garyelephant commented 6 years ago

2017年11月17日

(1)改为从command arguments读取: --conf spark.driver.extraJavaOptions=-Dconfig.path=application.conf

(2)spark.submit.deployMode

val spark = SparkSession
   .builder()
   .appName("SparkApp")
   .master("spark: //192.168.60.80:7077")
   .config("spark.submit.deployMode","cluster")
   .enableHiveSupport()
   .getOrCreate()