allwefantasy / spark-binlog

A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
Apache License 2.0
154 stars 54 forks source link

How to automatically discover the index file number of binlog ? #12

Closed zhengqiangtan closed 4 years ago

zhengqiangtan commented 4 years ago

Hi,excuse me! The indexes and offsets of the sample MySQL in the documentation are hard-coded. How do you track changes to the index file? And when the consumption interval is very big when the program hangs, how to carry on the consumption from the place of failure?

mysql binglog image

allwefantasy commented 4 years ago

Since the spark-binlog is a standard spark datasource, this means it will try to keep the exactly-once delivery promise which including situation e.g. application crash/restart. spark-binlog will persist the lasted committed offset(offset=logFileIndex + Position) in checkpoint directory. Every time when it starts up again,spark-binlog will get the offset from checkpoint instead of your configuration in code.

allwefantasy commented 4 years ago

In another case that everything goes fine, and the binlog event will contain the message of the log file name, so we can keep track of the change of names.

zhengqiangtan commented 4 years ago

Thank you for your answer @allwefantasy