Micro-optimize string serialization

niik commented 10 years ago

While profiling an extremely high-write scenario in one of my apps I found that a decent amount of time was being spent serializing (read: escaping) strings. My objects consisted almost exclusively of strings so I went in to see if there was any perf gains to be made.

I started out by optimizing the happy path of the string not needing to be escaped at all (ie no unsafe characters) and that change (cece3e4) is probably pretty uncontroversial. Then I went to town on the loop, still optimizing for there being fewer unsafe characters than safe, buffering up stringbuilder writes to ranges of safe characters. This lead to a small perf hit for extremely short string and a pretty decent speed increase for slightly largers strings.

Benchmark

This compares two scenarios, one where the string doesn't contain any unsafe characters (which I would guess is the case for most strings going through SimpleJson) and one where the string contains a percentage of unsafe characters. Each runs 1000000 iterations of serialize and runs GC collect in between each test to make sure garbage left over from the last run doesn't affect the outcome of the second.

string-perf

The test string with unsafe characters have a uniform distribution of unsafe characters which perhaps isn't that probable in a real-world scenario but it shouldn't significantly influence the outcome of the benchmark.

Worth noting is that while the percentages here are quite nice the wall time decrease isn't massively impressive. Long story short the stuff that's in there now is quick enough for the vast majority of scenarios. I just wanted to say that so you can consider whether or not its worth accepting or not. If you decide it's not worth it please do still consider cece3e4 as a separate change because that's a happy path which will have real impact on serialization.

haacked commented 10 years ago

:sparkles:

haacked commented 10 years ago

@prabirshrestha any thoughts on this?

prabirshrestha commented 10 years ago

@Haacked been a bit busy. I did look at it.

@niik do you have the code where you produced the benchmarks? gist would be great too.

niik commented 10 years ago

@prabirshrestha Oh sure, I should have included that as well, sorry. It's at https://gist.github.com/niik/2cf3b624d0bc73239ca5

prabirshrestha commented 10 years ago

@niik This PR is awesome. Clever optimization. Can you update the PR based on the comments before I merge and release a new version. Thanks

niik commented 10 years ago

@prabirshrestha Done, ready for re-review

niik commented 10 years ago

There we go, CamelCase'd

prabirshrestha commented 10 years ago

@niik by the way have you signed the Outercurve CLA? http://www.outercurve.org/Resources/DevelopmentPractices#agreements

Usually you wouldn't need to sign the CLA for small contributions. Might be @haacked will know more about this.

//cc @ntotten

haacked commented 10 years ago

@prabirshrestha usually Outercurve doesn't require this for small contributions such as bug fixes and perf increases. This contribution doesn't add any new features.

prabirshrestha commented 10 years ago

@Haacked great. merging it in.

prabirshrestha commented 10 years ago

@niik Thanks for the PR. This should now be available in v0.34.0 https://www.nuget.org/packages/SimpleJson/0.34.0

niik commented 10 years ago

@prabirshrestha No, thank you :)

facebook-csharp-sdk / simple-json

Micro-optimize string serialization #45

Benchmark