Closed mcapitanio closed 8 years ago
Hi @mcapitanio, I think you are right that the current behaviour would be that updates from both tables would go to the same topic.
I'm not too familiar with Postgres schemas or how they're generally used. I'd be interested to learn more about your use case for having the different schemas, if you're able to share.
Your proposal to include the schema as a prefix in the topic name makes sense (probably omitting the prefix for the default public
schema). We'd welcome a pull request to add this, and happy to advise. Currently the on_table_schema
callback in bottledwater.c is where we pull out the Avro schema name and set it as the topic name - that would be a good place to also pull out the Postgres schema name from the Avro schema namespace, and construct the qualified name for use as the topic name.
(Side note: the terminology gets pretty confusing: we've got the Postgres schema, the table schema, and the Avro schema, and to make it even more confusing we're using the Avro schema namespace name to store the Postgres schema name :))
Hi @samstokes , yeah, we should use prefix also in the terminology! :)
We are familiar with the concept of database-schemas with Oracle database, but it seems to me that the use case we adopted with Oracle db instances could be adopted also with Postgres ones. This use case for us is using different oracle-database-schema for different application, so each application has its own database-schema. With a weak data governance could happen that different applications, maybe developed by different teams without a shared phisical data model (maybe because they don't need one, not only because of a weak data governance...), could create in their own postgres-database-schema, living in the same postgres-database, tables with the same name.
So, I'll take a look at the bottledwater.c class and I'll be glad to open a new pull request on this, hopefully in a short time.
Hi @samstokes ,
I have just pulled #102 to integrate the original Postgres table schema name in the target Kafka topic name.
Of course every kind of comment and advise from you is really greatly appreciated and wellcome!
Thank you very much for the pull request! I'll try to take a look and respond with comments in the next couple of days.
Hello guys!
PotgreSQL schema prefix is absolutely required for any large scale project and I greatly appreciate PR #102 !
Confusingly enough, PostgreSQL schema is equivalent of MySQL database. But for the purpose of the discussion it is just table prefix in SQL in both cases. Oracle user is the same concept which is also confusing.
The good news is that all databases use same SQL syntax schema_name.table
and that is all we need to know to support it in Kafka topics. Way to get schema_name
from database WAL logs is database specific though.
Bello,
yes, totally agree, pratically a schema it's just a table prefix.
One further and very little enhancement could be change the BottleWater constant which represents the prefix separator from "-" to "." so to have for Kafka topics the same "naming convention" there is in the database, so:
Thanks a lot @mcapitanio and @deem0n for explaining the use cases for schemas. My usage of Postgres has always been in a cloud environment where each app gets its own Postgres database, but I can totally see how schemas would be essential if you have multiple apps sharing the same database.
Hello,
maybe this is a silly question, but it seems to me that now BottledWater keeps information about the Postgres schema just in the namespace of the key and value of Avro, so for example for the same table named "table_1" belonging both to a schema "schema_1" and to a schema "schema_2" of a same database "database" (and maybe with different schema structure) what would happen? The same topic would be used for both the tables?
Could make sense to incorporate the source Postgres schema name in the name of the target topic of BottledWater? I see that there is already a "--topic-prefix" option but it has a database scope. BottledWater could automatically create a topic named:
It would also be useful to make the topic separator a further option "--topic-separator". Some tools (ie. PrestoDB) reads the "." as a separator to identify the database name and the topic created by BottledWater could be separated by a "." instead of a "-".