dylex / postgresql-typed

Haskell PostgreSQL library with compile-time type inference
http://hackage.haskell.org/package/postgresql-typed
Other
83 stars 12 forks source link

Encode non-ASCII characters in query (i.e. identifiers). #23

Closed jekor closed 3 years ago

jekor commented 3 years ago

I haven't tested the impact on performance or correctness of this, but I've been using it for the past year so that I can use unicode identifiers (such as for function names).

dylex commented 3 years ago

This seems reasonable, and since the other side presumably has to do the decoding anyway it doesn't seem too significant. My main question is whether this is always correct. Does the postgresql protocol layer always use UTF-8 for encoding, or does it depend on the encoding settings of the database? I know we're setting client_encoding=UTF8 so maybe that refers to everything?

dylex commented 3 years ago

It's not completely clear to me from the docs, but I think this should be correct. It's maybe worth adding a test with a unicode string literal (non-placeholder), like SELECT '<unicode>' to make sure it comes back correctly with this.

jekor commented 3 years ago

Thanks for looking into it.

For a comprehensive test we'd need to predefine something (like a function) in the DB with a non-ASCII identifier in at least 2 databases (one with a UTF-8 encoding, one with another encoding) and then test calling the function in both.

However, I think that matching the client_encoding is the only thing that we as a client need to worry about. If any other encoding matters, it's likely broken already. There should be no change (other than encoding overhead) for ASCII identifiers, and non-ASCII identifiers are already broken (they'll result in an error about invalid UTF-8 bytes from PostgreSQL).

dylex commented 3 years ago

Working on a test for this pointed out that we also let literal NUL chars through, which breaks things, so gonna try to fix that too. Which is mostly to say I'm working on a test for this and will merge it in then.