Open riedel opened 3 years ago
BTW it would be easier to debug if it would not fail after nearly 2 hours :) . Seems to be a known problem that JSONB does not support \u0000 characters: https://www.postgresql.org/docs/13/datatype-json.html . So what would be the correct replacement encoding for \u0000 in FROST since the SensorThings API I guess allows any json encoding according to RFC4627 from what I could find in the specs.
Hmm, \u0000 is the NULL character. If that is the only content in the JSON field, the correct replacement would be setting the entire field to (SQL) NULL. Just out of interest, how many entries are there in this Observations table?
The problem is that we are having about 280 Million Observations. I think the null char is part of a whole string that was passed in the wrong way when parsing. I think we could I guess delete it. What happens if you push strings with unicode null into the data base at the moment?
Right now they just get stored as is, since the column is currently a TEXT field that accepts Unicode null characters. You could do a search on the null character to see which rows have one. Might take a while, but then you can see if it really is an issue or if they can just be cleaned out.
Thanks.
Yes, we are currently trying to do this now.
I thought there might be a way to escape the null somehow in FROST on top of postgres since it is a legal JSON character and only a problem of the specific database. But I guess it is an extreme edge case of FROST incompatibilty with the Sensorthings standard which references OData JSON which references the JSON RFC which allows null chars. Maybe it would be good to add a known to fail test case to document this.
We'll see if other users also encounter this issue. After the conversion to JSONB columns, trying to insert an entity with a null byte in a json string will return an error, so there should not be any new null bytes in the database going forward. We could look into adding an option to convert null bytes to another character (probably U+FFFD REPLACEMENT CHARACTER) on input.
But that won't help you with your current database that already has null bytes in the database...