Server encoding don't match each other and CLIENT_ENCODING not considered

giorgiomorina commented 3 years ago

Dear Gilles, first of all THANK YOU for all your work.

I'm getting the following issue doing a parallel direct COPY command (ORACLE_COPIES>1, JOBS>1): it seems that even though both source and target databases are UTF-8 encoded, ora2pg doesn't use the correct encoding.

Oracle source database

SELECT *
  FROM gv$nls_parameters
 WHERE parameter LIKE 'NLS_CHARACTERSET';
1   NLS_CHARACTERSET    AL32UTF8    0
3   NLS_CHARACTERSET    AL32UTF8    0
4   NLS_CHARACTERSET    AL32UTF8    0
2   NLS_CHARACTERSET    AL32UTF8    0

Postgres target database

SHOW SERVER_ENCODING;
 server_encoding
-----------------
 UTF8
(1 row)

ora2pg's log

Ora2Pg version: 21.1
Export type: COPY
Geometry export type: INTERNAL
ORACLE_HOME = /usr/lib/oracle/21/client64
NLS_LANG = AMERICAN_AMERICA.AL32UTF8
NLS_NCHAR = AL32UTF8
[...]
Setting client_encoding to UTF8...

Moreover it seems it doesn't take into account the parameter CLIENT_ENCODING to include automatically the \encode directive in the COPY statements and consequentially it doesn't make a UTF8 encoded discards file:

$ file <omissis>_TABLENAME_error.log
<omissis>_TABLENAME_error: ISO-8859 text, with very long lines

$ grep CLIENT_ENCODING *conf
INSERT_ora2pg.conf:CLIENT_ENCODING        ISO_8859_5

darold commented 3 years ago

I don't understandc the exact problem you are trying to solve. If data are exported from Oracle in utf8 then CLIENT_ENCODING should be utf8 too, you are sending utf8 data to PostgreSQL.

Can you post the result of command: ora2pg -c ora2pg.conf -t SHOW_ENCODING

giorgiomorina commented 3 years ago

Hi Darold, the problem is that some unrecognized characters prevent us from importing data and leads to collecting large reject-files - containing bad chars too. I found a workaround which involves converting the rejects-file through iconv, but sometimes, particularly with large tables, the COPY command stored in them is malformed - perhaps due to the parallelism; following what happens in there:

COPY

() from STDIN;

COPY

() from STDIN; ./ ./ Anyway, here is the requested command output: Current encoding settings that will be used by Ora2Pg: Oracle NLS_LANG AMERICAN_AMERICA.*AL32*UTF8 Oracle NLS_NCHAR AL32UTF8 Oracle NLS_TIMESTAMP_FORMAT YYYY-MM-DD HH24:MI:SS.FF6 Oracle NLS_DATE_FORMAT YYYY-MM-DD HH24:MI:SS PostgreSQL CLIENT_ENCODING UTF8 Perl output encoding '' Showing current Oracle encoding and possible PostgreSQL client encoding: Oracle NLS_LANG AMERICAN_AMERICA.*AL32*UTF8 Oracle NLS_NCHAR AL32UTF8 Oracle NLS_TIMESTAMP_FORMAT YYYY-MM-DD HH24:MI:SS.FF6 Oracle NLS_DATE_FORMAT YYYY-MM-DD HH24:MI:SS PostgreSQL CLIENT_ENCODING UTF8 Il giorno gio 24 giu 2021 alle ore 23:01 Gilles Darold < ***@***.***> ha scritto: > I don't understandc the exact problem you are trying to solve. If data are > exported from Oracle in utf8 then CLIENT_ENCODING should be utf8 too, you > are sending utf8 data to PostgreSQL. > > Can you post the result of command: ora2pg -c ora2pg.conf -t SHOW_ENCODING > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > , or > unsubscribe > > . >

darold commented 3 years ago

Well, closing I don't know why the Oracle output is not in the promised encoding. Right iconv is the solution in this case.

darold / ora2pg

Server encoding don't match each other and CLIENT_ENCODING not considered #1189