allwefantasy / spark-binlog

A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
Apache License 2.0
154 stars 54 forks source link

请问为什么batch里面没数据呢 #6

Closed ZhiYinZhang closed 4 years ago

ZhiYinZhang commented 4 years ago

这是我的代码: val df = spark.readStream .format("org.apache.spark.sql.mlsql.sources.MLSQLBinLogDataSource") .option("host","localhost") .option("port","3306") .option("userName","root") .option("password","123456") .option("databaseNamePattern","test") .option("tableNamePattern","test1") .option("binlogIndex", "2") .option("binlogFileOffset", "4") .option("bingLogNamePrefix","binlog") .load() df.writeStream.format("console").outputMode("append").start().awaitTermination()

输出:

Batch: 0

+-----+ |value| +-----+ +-----+


Batch: 1

+-----+ |value| +-----+ +-----+


Batch: 2

+-----+ |value| +-----+ +-----+

allwefantasy commented 4 years ago

请按如下步骤检查一下:

  1. 根据 how-to-get-the-initial-offset 确保你的配置都是正确的。
  2. 你的数据库是不是有新增,更新的操作发生。因为我们增量,需要有事件发生我们才会补充。
ZhiYinZhang commented 4 years ago

我也写了数据到表里面,表里也有数据。程序batch也会触发,说明能监听到事件的发生,但是就是没有数据 mysql>show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | binlog.000002 | 10603 | | | | +------------------+----------+--------------+------------------+-------------------+ val df = spark.readStream .format("org.apache.spark.sql.mlsql.sources.MLSQLBinLogDataSource") .option("host","localhost") .option("port","3306") .option("userName","root") .option("password","123456") .option("databaseNamePattern","test") .option("tableNamePattern","test1") .option("binlogIndex", "2") .option("binlogFileOffset", "10603") .option("bingLogNamePrefix","binlog") .load() df.writeStream.format("console").outputMode("append").start().awaitTermination()


Batch: 4

+-----+ |value| +-----+ +-----+


Batch: 5

+-----+ |value| +-----+ +-----+

allwefantasy commented 4 years ago

Spark Structured streaming 会打印每个周期的一些明细,比如处理了几条数据等等,你能把这个也贴出来么?

ZhiYinZhang commented 4 years ago

每个batch的numInputRows都为0,

bebee4java commented 4 years ago
  1. 确保你mysql实例已经开启binlog: mysql> show variables like 'log_bin'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | ON | +---------------+-------+
  2. 确保你的binlog格式为row: mysql> show variables like 'binlog_format'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | binlog_format | ROW | +---------------+-------+ 请检查这两个配置,看能不能解决你的问题。
ZhiYinZhang commented 4 years ago

这两个参数我都配置了,而且我单独使用下面这个包去监听binlog是可以读到各类事件的数据的

com.github.shyiko mysql-binlog-connector-java 0.18.1