Open JosefMachytkaNetApp opened 9 months ago
Can't look deeper now, but trying to decode UTF-16 with UTF-8 would produce this problem, yes. Did you specify the --encoding option?
@svantevonerichsen6906 Thank you for your comment, I tried it, but looks like in PostgreSQL to PostgreSQL mode pgloader does not allow this parameter. When I add it in command line, I get following messages and pgloader stops:
pgloader version 3.6.7~devel
compiled with SBCL 2.2.9.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2024-02-23T12:39:52.004000Z NOTICE Starting pgloader, log system is ready.
2024-02-23T12:39:52.016001Z INFO Starting monitor
2024-02-23T12:39:52.024001Z LOG pgloader version "3.6.7~devel"
2024-02-23T12:39:52.024001Z INFO Stopping monitor
And if I try to add it into WITH part, pgloader complains on config file parsing:
2024-02-23T12:38:48.316014Z INFO Stopping monitor
What I am doing here?
At
concurrency = 1,
prefetch rows = 100,
^ (Line 11, Column 27, Position 249)
In context PGSQL-OPTIONS:
While parsing PGSQL-OPTIONS. Expected:
the character Tab
or the character Newline
or the character Return
or the character Space
or the string "--"
or the string "/*"
or the string "batch"
or the string "concurrency"
or the string "create"
or the string "data"
or the string "disable"
or the string "downcase"
or the string "drop"
or the string "foreign"
or the string "include"
or the string "max"
or the string "no"
or the string "on"
or the string "prefetch"
or the string "preserve"
or the string "quote"
or the string "reset"
or the string "schema"
or the string "snake_case"
or the string "truncate"
or the string "uniquify"
or the string "workers"
I showed only last 2 lines from WITH part which it was able to parse, next one is "encoding" and it fails on it because it is not among expected strings.
So unfortunately it looks like "encoding" cannot help here.
BTW, I used FreeTDS library with "client charset = UTF-8" so it should convert data to UTF-8 - it had been discussed already on StackOverflow. So it is surprising for me that Lisp seems to fail. Do you know if ODBC would work better here?
Hi guys, I have encounter a problem which seems to come from trivial-utl-8 lisp library. At least I cannot find this error message anywhere in pgloader code or in PostgreSQL code. Only in this lisp library - https://github.com/fukamachi/trivial-utf-8/blob/master/trivial-utf-8.lisp#L110 - but this code seems to be abandoned for 13 years.
I am using pgloader to copy data in parallel from foreign tables created by tds_fdw extension, into target PostgreSQL tables. In PostgreSQL all works as expected. I can select data from FDW table, I can do INSERT INTO target_table SELECT * FROM fdw_table, all works. But when I start pgloader to copy data from all tables, it fails on one particular table with an error message - "A thread failed with error: Invalid byte at start of character: 0xFC" - see details below.
Underlying data in source database are encoded in UTF-16. Tds_fdw and PostgreSQL process data without problems. But looks like lisp library fails. Maybe someone would have some idea what to do with it?
Thank you very much.
[X] pgloader --version
[X] did you test a fresh compile from the source tree? yes, no change
[X] did you search for other similar issues? Yes, this seems to be a very rare issue coming from lisp itself.
[X] how can I reproduce the bug? Underlying data in source database are encoded in UTF-16. Tds_fdw and PostgreSQL processes data without problems. But lisp library fails.
[X] pgloader output you obtain
[X] data that is being loaded, if relevant Cannot past client's data here
[X] How the data is different from what you expected, if relevant Lisp library fails on reading data, although in PostgreSQL itself all works.