EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
521 stars 160 forks source link

PGDatabaseEncoding is not the default #271

Open aledonati opened 1 year ago

aledonati commented 1 year ago

Hi,

I would expect the default for character_set to be PGDatabaseEncoding, because I don't see any practical use case not to use this default. What I want is fdw to automatically cope with charset conversions, whatever they are in mysql (source) and postgres (destination) dbs.

But I might be missing something...Can you explain the rationale behind using auto in place of PGDatabaseEncoding as a default?

Thanks in advance Alessandro

surajkharage19 commented 1 year ago

Hi @aledonati,

We kept the default value as auto just to make sure that it picks the character set based on the operating system setting. If the source (PG) and target (MySQL) database encodings are different then we have observed that incorrect data is getting inserted in MySQL. As per my understanding, we need to set the character_set which is compatible with MySQL server. Do you see any issues with the current default value? Do you have any test case for the same?

aledonati commented 1 year ago

Hi @surajkharage19,

I need to pull data into a postgres database, whose encoding is UTF8, from a mysql db, whose character_set_database is latin1. I have no control over mysql, that data source is external and read-only to me. My use case is to create a foreign server, import a foreign schema, then pull data into local tables in postgres.

With character_set not provided as an option when creating the server, some invalid utf8 text (invalid byte sequence...) gets his way into postgres tables in text columns. My understanding (wrong, but I would like to understand why) was that not specifying character_set would have meant to delegate to mysql_fdw the task to translate between the source (mysql) and the dest (PG) encoding transparently, whatever they are.

Obviously with character_set=PGDatabaseEncoding it works, text is properly converted to UTF8.

Pls forgive my ignorance, but when you say We kept the default value as auto just to make sure that it picks the character set based on the operating system setting , which operating system do you mean? The postgres host machine? I don't see the advantage of doing so, because one could have postgres on a host machine, OS linux UTF8, but for some reason having a database with encoding other than UTF8.

So in other words who cares about the os character set? What I want is convert text to/from database encodings.

I have disregarded the use case of updating mysql through the foreing server, but again, I would expect only one standard behavior: convert from current database encoding into the detected mysql character_set_database. A non default character_set option is obviously useful, for example to fix wrong encodings either side, but the default should behave as I said.

Hope my explanation was clear!

Thanks Alessandro

surajkharage19 commented 1 year ago

Hi @aledonati,

Thanks for your reply.

Obviously with character_set=PGDatabaseEncoding it works, text is properly converted to UTF8.

Right and for the same reason (encoding mismatch) we have introduced this character_set option.

when you say We kept the default value as auto just to make sure that it picks the character set based on the operating system setting , which operating system do you mean? The Postgres host machine?

We are providing character_set option while making a connection to MySQL server, so here OS means MySQL host machine. My opinion is that if there is any encoding-related problem then we can create/alter server options and provide the encoding that we want which should resolve the problem. Anyways, we will revisit this option's default value and will see if we can change it.