Closed niik closed 10 years ago
:sparkles:
@prabirshrestha any thoughts on this?
@Haacked been a bit busy. I did look at it.
@niik do you have the code where you produced the benchmarks? gist would be great too.
@prabirshrestha Oh sure, I should have included that as well, sorry. It's at https://gist.github.com/niik/2cf3b624d0bc73239ca5
@niik This PR is awesome. Clever optimization. Can you update the PR based on the comments before I merge and release a new version. Thanks
@prabirshrestha Done, ready for re-review
There we go, CamelCase'd
@niik by the way have you signed the Outercurve CLA? http://www.outercurve.org/Resources/DevelopmentPractices#agreements
Usually you wouldn't need to sign the CLA for small contributions. Might be @haacked will know more about this.
//cc @ntotten
@prabirshrestha usually Outercurve doesn't require this for small contributions such as bug fixes and perf increases. This contribution doesn't add any new features.
@Haacked great. merging it in.
@niik Thanks for the PR. This should now be available in v0.34.0 https://www.nuget.org/packages/SimpleJson/0.34.0
@prabirshrestha No, thank you :)
While profiling an extremely high-write scenario in one of my apps I found that a decent amount of time was being spent serializing (read: escaping) strings. My objects consisted almost exclusively of strings so I went in to see if there was any perf gains to be made.
I started out by optimizing the happy path of the string not needing to be escaped at all (ie no unsafe characters) and that change (cece3e4) is probably pretty uncontroversial. Then I went to town on the loop, still optimizing for there being fewer unsafe characters than safe, buffering up stringbuilder writes to ranges of safe characters. This lead to a small perf hit for extremely short string and a pretty decent speed increase for slightly largers strings.
Benchmark
This compares two scenarios, one where the string doesn't contain any unsafe characters (which I would guess is the case for most strings going through SimpleJson) and one where the string contains a percentage of unsafe characters. Each runs 1000000 iterations of serialize and runs GC collect in between each test to make sure garbage left over from the last run doesn't affect the outcome of the second.
The test string with unsafe characters have a uniform distribution of unsafe characters which perhaps isn't that probable in a real-world scenario but it shouldn't significantly influence the outcome of the benchmark.
Worth noting is that while the percentages here are quite nice the wall time decrease isn't massively impressive. Long story short the stuff that's in there now is quick enough for the vast majority of scenarios. I just wanted to say that so you can consider whether or not its worth accepting or not. If you decide it's not worth it please do still consider cece3e4 as a separate change because that's a happy path which will have real impact on serialization.