Closed molsonkiko closed 7 months ago
One other thought: at the end of the function, you should probably check if the JSON string is over a certain length (say 50 MB) and only turn on the JSON lexer if it's under that length. The JSON lexer is pretty slow and JsonTools leaves it off for pretty-printing big files.
Sorry for the late reply, the menu option Convert Data > JSON
and also XML
are indeed very slow. Never really noticed it because I don't use them as much.
The code with unnecessary memcopy, while ugly, doesn't contribute to the long processing time as far as I can see. It's mainly due to applying the syntax highlighting at the end. So I've added a new setting AutoSyntaxLimit
which by default is 1MB.
I agree that your code is better than using multiple string.Replace()
but I'll have to do some more testing with large files.
New setting AutoSyntaxLimit
is available in the latest release
I tried converting a 10MB CSV file to JSON. A preview:
Essentially one column being parsed as strings (length maybe 8-15 chars), one column as floats, one column as integers.
Conversion to JSON took over a minute, of which probably about 5 seconds is attributable to Scintilla's JSON lexer (based on comparison to PythonScript with the
csv
andjson
modules). I am pretty sure that the culprit is this block of code here:This code copies colvalue 8 times, and then a 9th time when you finally append it to sb, which in turn puts a lot of pressure on the garbage collector. I've achieved significant performance improvements in JsonTools by removing such unnecessary copying of memory.
I recommend doing something like this (based on string conversion method in JsonTools):
I can't guarantee that this will fix all your problems (there's a lot of other unnecessary memcopy here) but it will probably reduce runtime by at least 20%.