go-mysql-org / go-mysql-elasticsearch

Sync MySQL data into elasticsearch
MIT License
4.1k stars 798 forks source link

Mysqldump dumps all tables #240

Open derN3rd opened 6 years ago

derN3rd commented 6 years ago

Hey there,

we've been using go-mysql-elasticsearch since 5 weeks now without any problems. Now we added a schema from a second database and since then if we start a fresh sync it starts dumping all tables and not only the onces configured in the river.toml

This is actually not good as it syncs more than 100gb of unused data in our case.

Is there a way to ignore some tables in a mysqldump? config:

my_addr = "***"
my_user = "***"
my_pass = "***"
my_charset = "utf8mb4"

data_dir = "./var"
server_id = 1003
flavor = "mysql"
mysqldump = "mysqldump"

[[source]]
schema = "companyname"
tables = ["user"]

# to ignore these two databases
[[source]]
schema = "grafana"
tables = []

[[source]]
schema = "weblate"
tables = []

[[source]]
schema = "companyname_testing"
tables = ["user"]

# Below is for special rule mapping

[[rule]]
schema = "companyname_testing"
table = "user"
index = "user_testing"
type = "user_testing"
siddontang commented 6 years ago

Hi @derN3rd

You assign many databases to sync, but for mysqldump, it can only support dump specified tables in one database. So here we have no way but to dump all of the specified databases.

derN3rd commented 6 years ago

Isn't it possible to run mysqldump for every single db then? The binlog pointer could be saved and then we start syncing with binlog after all mysqldump commands finished

siddontang commented 6 years ago

Great idea @derN3rd

I don't have time this week but will still think about how to support it. The simplest way maybe record the smallest binlog position of all dumps, then start from it. This may sync some duplicated data, but it doesn't matter.

It is very appreciated that you can send me a PR.