goatfungus / NMSSaveEditor

No Man's Sky - Save Editor
1.63k stars 228 forks source link

export to JSON produces invalid JSON #961

Open gitdp7885 opened 3 months ago

gitdp7885 commented 3 months ago

In some cases the "Id" uses some form of hexadecimal in it's value. For example:

"Id": "^\x80\x82TF\x84)#00000",

If this is indeed the correct value then the backslash characters need to be doubled. Essentially escaping the backslash, like so

"Id": "^\ \x80\x82TF\ \x84)#00000",

(I had to put a space between the backslashes so it would show up properly.)

meachware commented 3 months ago

Same as my issue reported here: https://github.com/goatfungus/NMSSaveEditor/issues/954

goatfungus commented 3 months ago

You are correct, \x is not a "valid" JSON escape, but there is a very good reason for this.

The save format itself uses multiple character encodings and contains raw unencoded byte data, so if you export the JSON as-is and try to modify it with a normal JSON editor you'll end up corrupting the file. In order to partially get around this corruption, I've had to add my own type of escape (\x) which is followed by 2 hexadecimal characters to indicate a raw byte value. This modified format works better than the in-game format for manual editing purposes.

So in conclusion, the JSON produced by the editor in the export methods is an extension to the JSON format that allows for raw byte data by using \x## escape values.

Korkman commented 2 months ago

I'd like to chime in here. I somewhat agree with @gitdp7885! Escaping the raw hex data within individual strings opposed to a run over the serialized JSON file would be more compatible, resulting in serialized JSON \\x00 for NUL and \\\\x00 for the literal sequence \x00). That way JSON parsers could deal with the files and at the same time @goatfungus the danger of corruption is still non-present. The obvious downside is performance, as every single string would have to be escaped, and increased file size. But it's probably way too late to change the file format, and not worth the effort.

@gitdp7885 If you want to parse the JSON file with, say, PHP, you can try my workaround: pre-parse the file and replace all \x instances with a unicode symbol (unicode is non-present in the file as the individual bytes were \x escaped, as I understand it):

$hex_escape_symbol = '\\u168a';
$content = preg_replace('/(?<!\\\\)(\\\\\\\\)*\\\\x/', '$1' . $hex_escape_symbol, $content);
$data = json_decode($content, null, 512, JSON_THROW_ON_ERROR);

Divide the number of backslashes in the preg by 2 if your language has native preg syntax, like JavaScript.

While at it, keep in mind float values have specific precision and it is non-trivial to protect them from rounding errors. So if you plan on writing the format back, here as well a pre-parser can wrap all floats into strings, marked with another unicode symbol prefix.

$float_escape_symbol = '\\u1673';
$content = preg_replace('/^([\t ]+|[^:]+: )([\-0-9]+\.[\-0-9Ee]+)/m', '$1"'.$float_escape_symbol.'$2"', $content);

All floats are now strings.

The way back:

$content = json_encode($data, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);
$content = preg_replace('/^([\t ]+|[^:]+: )"'.preg_quote(json_decode('"'.$float_escape_symbol.'"'), '/').'(.+)"/m', '$1$2', $content);
$content = str_replace(json_decode('"'.$hex_escape_symbol.'"'), '\x', $content);