FraunhoferIOSB / FROST-Server

A Complete Server implementation of the OGC SensorThings API
https://fraunhoferiosb.github.io/FROST-Server/
GNU Lesser General Public License v3.0
200 stars 74 forks source link

Upgrade to JSONB on migration 2020-05-15-toJsonB-9 fails due to unicode \u0000 characters #375

Open riedel opened 3 years ago

riedel commented 3 years ago
04:43:40.029 [            main] ERROR  liquibase.changelog.ChangeSet - Change Set tablesString.xml::2020-05-15-toJsonB-9::scf failed.  Error: ERROR: unsupported Unicode escape seque
nce
  Detail: \u0000 cannot be converted to text.
  Where: JSON data, line 1: ...tus":"607","Reject_count_LongTOF":"0","Checksum":... [Failed SQL: (0) ALTER TABLE "public"."OBSERVATIONS" ALTER COLUMN "RESULT_JSON" TYPE JSONB USING
("RESULT_JSON"::JSONB)]
riedel commented 3 years ago

BTW it would be easier to debug if it would not fail after nearly 2 hours :) . Seems to be a known problem that JSONB does not support \u0000 characters: https://www.postgresql.org/docs/13/datatype-json.html . So what would be the correct replacement encoding for \u0000 in FROST since the SensorThings API I guess allows any json encoding according to RFC4627 from what I could find in the specs.

hylkevds commented 3 years ago

Hmm, \u0000 is the NULL character. If that is the only content in the JSON field, the correct replacement would be setting the entire field to (SQL) NULL. Just out of interest, how many entries are there in this Observations table?

riedel commented 3 years ago

The problem is that we are having about 280 Million Observations. I think the null char is part of a whole string that was passed in the wrong way when parsing. I think we could I guess delete it. What happens if you push strings with unicode null into the data base at the moment?

hylkevds commented 3 years ago

Right now they just get stored as is, since the column is currently a TEXT field that accepts Unicode null characters. You could do a search on the null character to see which rows have one. Might take a while, but then you can see if it really is an issue or if they can just be cleaned out.

riedel commented 3 years ago

Thanks.

Yes, we are currently trying to do this now.

I thought there might be a way to escape the null somehow in FROST on top of postgres since it is a legal JSON character and only a problem of the specific database. But I guess it is an extreme edge case of FROST incompatibilty with the Sensorthings standard which references OData JSON which references the JSON RFC which allows null chars. Maybe it would be good to add a known to fail test case to document this.

hylkevds commented 3 years ago

We'll see if other users also encounter this issue. After the conversion to JSONB columns, trying to insert an entity with a null byte in a json string will return an error, so there should not be any new null bytes in the database going forward. We could look into adding an option to convert null bytes to another character (probably U+FFFD REPLACEMENT CHARACTER) on input.

But that won't help you with your current database that already has null bytes in the database...