fastify / fast-json-stringify

2x faster than JSON.stringify()
MIT License
3.49k stars 207 forks source link

perf: fast regex check on string #696

Closed cesco69 closed 6 months ago

cesco69 commented 6 months ago

The regex

[\u0000-\u001f\u0022\u005c\ud800-\udfff]|[\ud800-\udbff](?![\udc00-\udfff])|(?:[^\ud800-\udbff]|^)[\udc00-\udfff]

can be

[\u0000-\u001f\u0022\u005c\ud800-\udfff]

The first part of alternation ([\u0000-\u001f\u0022\u005c\ud800-\udfff]) will match anything that might have been matched by the second ([\ud800-\udbff](?![\udc00-\udfff])) and third part ((?:[^\ud800-\udbff]|^)[\udc00-\udfff]).

I have made this simple online test for play with both rule (https://jsfiddle.net/nd84e7fz/):

const NEW = /[\u0000-\u001f\u0022\u005c\ud800-\udfff]/;
const OLD = /[\u0000-\u001f\u0022\u005c\ud800-\udfff]|[\ud800-\udbff](?![\udc00-\udfff])|(?:[^\ud800-\udbff]|^)[\udc00-\udfff]/;

const str1 = '\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r' +
              '\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017' +
              '\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f"\\abc012'
console.log(OLD.test(str1) === NEW.test(str1), 'should escape control chars')

const str2 = '\uDF06\uD834'
console.log(OLD.test(str2) === NEW.test(str2), 'should escape surrogate pair')

output

true, "should escape control chars"
true, "should escape surrogate pair"

The two rule works in the same manner!

benchmark only regex execution

NEW REGEX 96K ops/s ± 8.44%
OLD REGEX 82K ops/s ± 8.23% (14.57 % slower)

benchmark fast-json-stringify (npm run bench) with focus on long string master

long string without double quotes........................ x 14,857 ops/sec ±0.33% (187 runs sampled)
long string.............................................. x 15,856 ops/sec ±0.34% (192 runs sampled)

PR

long string without double quotes........................ x 15,732 ops/sec ±0.47% (191 runs sampled)
long string.............................................. x 16,211 ops/sec ±0.41% (191 runs sampled)

DIFF

long string without double quotes...... 5.88% (PR faster)
long string.............................2.24% (PR faster)

Checklist

cesco69 commented 6 months ago

Side node, also merged into the "origin" project of the regex https://github.com/BridgeAR/fast-json-escape/pull/6

@mcollina this PR is a good perfrmance improvent (the other I have made are just micro-optimization), please, I need some attention here.

mcollina commented 6 months ago

What attention do you need?

cesco69 commented 6 months ago

What attention do you need?

@mcollina Because it seem a good improvement on long string parsing (from 2.2% to 5.8), I would like to get this PR into the master's as quickly as possible.