Closed adamritter closed 9 months ago
One thing that I know is that path parsing doesn't handle quoting well (if a map contains a key with " or ), but there might be other errors.
It would probably good to test on a diverse set of JSONs, compare with gron output and also compare with fastgron | fastrgon -u | fastgron output
Just came here to post this issue but I see my comment has beat me to it. If it helps, I can upload my source file somewhere (a Minecraft region file converted to JSON with fastnbt-tools
, 102M when zstd
'd).
Seems to be doubling up quotes in the output for some reason. fastgron
creates the output correctly:
json[1024].block_entities[14].RecipesUsed["minecraft:cooked_cod"] = 142
But then fastgron -u
outputs this:
"RecipesUsed": {
"minecraft:cooked_cod"": 142,
I did think that it might be the colon because that's the first occurrence of a key with the colon in the output - some minimal tests suggest it might be.
This one gets double-quoted through fastgron | fastgron -u
.
{
"RecipesUsed": {
"minecraft:ghost": 1
}
}
This one does not.
{
"RecipesUsed": {
"minecraft_ghost": 1
}
}
This one only gets double-quoted on the key with the colon.
{
"RecipesUsed": {
"minecraft_ghost": 1,
"minecraft:test": 2,
"minecraft_after": 3
}
}
Thanks for helping!
I introduced some bugs that should be now fixed in v0.7.5 . Could you please try that version?
The double-quoting seems to be fixed looking at my minimal test cases.
Giving it a run through the same Minecraft JSON, though, I'm now getting:
jq: parse error: Unfinished JSON term at EOF at line 174299223, column 0
Output actually has 174299222 lines and finishes with unclosed items.
-1,
-1,
-1,
-1,
-1,
-1,
-1,
-1,
-1,
Uploaded it here - might just be because it's huge that it's causing problems. Certainly some of the smaller files I've tested have been ok.
It works for me using v0.7.5 :
./build/fastgron -V fastgron version 0.7.5
ls -l pp* -rw-r--r-- 1 adamritter staff 8997290281 Dec 5 23:40 pp.gron -rw-r--r-- 1 adamritter staff 821280774 Dec 5 23:38 pp.json -rw-r--r-- 1 adamritter staff 3207448348 Dec 5 23:42 pp2.json (base) ➜ ~/Documents/GitHub/fastgron/fastgron git:(main) ✗ less pp2.json (base) ➜ ~/Documents/GitHub/fastgron/fastgron git:(main) ✗ less pp. (base) ➜ ~/Documents/GitHub/fastgron/fastgron git:(main) ✗ cat pp2.json| fastgron > /dev/null (base) ➜ ~/Documents/GitHub/fastgron/fastgron git:(main) ✗ cat pp2.json| jq > /dev/null cat pp2.json 0.05s user 0.62s system 1% cpu 57.335 total jq > /dev/null 127.55s user 1.65s system 99% cpu 2:09.75 total
It's very strange, it looks like ungron doesn't keep Heightmaps.MOTION_BLOCKING_NO_LEAVES, only Heightmaps.MOTION_BLOCKING sometimes.
A minimal example is here (m has to be a substring of m2 for this bug to appear):
json[0] = {} json[0].m = null json[0].m2 = null
fastgron --ungron pp01.gron [ { "m": null } ]
I released v0.7.6 which both improves on --ungron time and fixes the bug which is a path optimization (m2 is not a subpath of m).
I did an md5sum to make sure that now --ungron and then fastgron on a sorted output gives the same result:
md5sum pp.gron cf8c0520c75d909596be7b87ecb4bd62 pp.gron (base) ➜ ~/Documents/GitHub/fastgron/fastgron git:(main) md5sum pp5.gron cf8c0520c75d909596be7b87ecb4bd62 pp5.gron
Could you please benchmark fastgron --ungron again? I hope it uses much less memory as well now with the fix.
Damn, beaten to the punch again, was just coming to post about missing keys although mine is LootTableSeed
- there's 370 in the source JSON and only 339 after it's been through fastgron -u
. BUT 0.7.6 reports 370.
If I pipe the source through jq -S | xxh64sum
, I get the same as piping it through fastgron | fastgron -u | jq -S | xxh64sum
which is perfect reconstruction.
On this M1 Pro, using pp.json.zst
, /usr/bin/time -l
reports 2.4G peak for fastgron
and 19G peak for the fastgron -u
. On my small linux server, it's still around 39G peak for fastgron -u
(where it was for 0.7.4). May just be a difference in how macOS and Linux measure usage?
Sounds great! I close this issue as fixed, but feel free to open new ones, or reply to my email if you have new ideas.
It's the memory fix in fastgron -u in the last version. 39G was suspicious, so I looked through the code, and I was passing an object by value instead of by reference, so it was copied, that took so much RAM.
The fix had a nice effect on speed as well, although I believe ungron could be still made significantly faster.
fastgron
: 8.5s 2.2G residentedit: Interestingly whilst doing this test, I piped the output intofastgron -u
(39.5G resident) andjq
rejected that. Will have to investigate further but it's a bit of a flaw if it can't rehydrate its own output into valid JSON.https://news.ycombinator.com/reply?id=38534721&goto=threads%3Fid%3Dxiphias2%2338534721