darold / ora2pg

Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema. It connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into PostgreSQL.
http://www.ora2pg.com/
GNU General Public License v3.0
978 stars 341 forks source link

Escape characters are lost during COPY of oracle CLOB (with Json strings) to postgresql JSONB column #1765

Closed cchet closed 2 months ago

cchet commented 3 months ago

We face the issue that the escape characters from json strings stored as CLOBs in Oracle DB are lost during COPY and cause 'invalid Json type' on the PostgreSQL site.

FATAL: ERROR: invalid input syntax for type json.
DETAIL: Token "Zoom" is invalid. CONTEXT: JSON data, line 1: ...","designation":"Sitzbezug in Stoff, Dessin "Zoom... COPY tbl_offer_entity, line 9, column offer: "{"uuid":"75af6a8d-fc49-4f0e-80b2-49099eaa7fea","carListId":5439,"carListStatus":"VALID","status":"RE..."

In the Oracle database the quote is escaped but during COPY it seems to somehow get lost. (Shows a portion of the json string)

{"designation":"Sitzbezug in Stoff, Dessin \"Zoom\""}

We tried NOESCAPE [0 | 1] setting but it did not change anything.

These are the encoding settings:

$ ora2pg --source "$ORACLE_DSN" --password "$ORACLE_PWD" -t SHOW_ENCODING -c ./conf/ora2pg.conf Current encoding settings that will be used by Ora2Pg: Oracle NLS_LANG AMERICAN_AMERICA.AL32UTF8 Oracle NLS_NCHAR AL32UTF8 Oracle NLS_TIMESTAMP_FORMAT YYYY-MM-DD HH24:MI:SS.FF Oracle NLS_DATE_FORMAT YYYY-MM-DD HH24:MI:SS PostgreSQL CLIENT_ENCODING UTF8 Perl output encoding '' Showing current Oracle encoding and possible PostgreSQL client encoding: Oracle NLS_LANG AMERICAN_AMERICA.AL32UTF8 Oracle NLS_NCHAR AL32UTF8 Oracle NLS_TIMESTAMP_FORMAT YYYY-MM-DD HH24:MI:SS.FF6 Oracle NLS_DATE_FORMAT YYYY-MM-DD HH24:MI:SS PostgreSQL CLIENT_ENCODING UTF8

I am not quite sure where the problem is and how to solve it, would very appreciate help on this.

ora2pg docker image: 24.3

cchet commented 3 months ago

I also have found out that COPY FROM expects two backslashes maybe this is the reason why the string becomes invalid and the COPY fails because of an invalid JSON. Our data in Oracle has only one backslash for escaping for instance quotes.

I tried INSERT mode and TRANSFORM_VALUE but TRANSFORM_VALUE seems top produce an invalid SQL as commented here https://github.com/darold/ora2pg/issues/1766

darold commented 2 months ago

Commit fbb2ef6 fixes this issue.