CoreOffice / XMLCoder

Easy XML parsing using Codable protocols in Swift
https://coreoffice.github.io/XMLCoder/
MIT License
793 stars 104 forks source link

Simplify string isAllWhitespace implementation #235

Closed LucianoPAlmeida closed 2 years ago

LucianoPAlmeida commented 2 years ago

This patch is really very simple one, but I believe can be both a simplification on implementation and a perf improvement in the sense that as it is if we take for example as reference the implementation of trimmingCharacters we note that it will do a lot extra work as allocating buffers or a whole new string maybe, when this only need a simple check that can exit early and also avoid those extra allocations.

I'm not sure if is significant, because I cannot tell if this is used in hot paths within the project and also I didn't measure, so perf improvement is just a guess that makes sense to me, and since it was a simple change I thought in open the PR. So let me know if this makes sense :)

jshier commented 2 years ago

(Hope you don't mind a drive by nerd snipe.)

I took the liberty of doing some benchmarking here. Each case was 10,000 loops checking isAllWhitespace against a String of 10,000 characters. First case is all space, second is all as, and third is 5,000 as with 2,500 spaces on either end.

extension String {
    func isAllWhitespace() -> Bool {
        trimmingCharacters(in: .whitespacesAndNewlines) == ""
    }

//    func isAllWhitespace() -> Bool {
//        allSatisfy { $0.isNewline || $0.isWhitespace }
//    }
}

let allWhitespace = String(repeating: " ", count: 10_000)
//let noWhitespace = String(repeating: "a", count: 10_000)
//let mixed = String(repeating: " ", count: 2500) + String(repeating: "a", count: 5000) + String(repeating: " ", count: 2500)

var results: [Bool] = []
results.reserveCapacity(10_000)

for _ in 0..<10_000 {
    results.append(allWhitespace.isAllWhitespace())
}

precondition(results.allSatisfy { $0 == true })

Here are the results.

jshier@Jons-iMac isWhitespace % hyperfine './allWhitespaceTrimming'            
Benchmark 1: ./allWhitespaceTrimming
  Time (mean ± σ):     920.7 ms ±   5.6 ms    [User: 911.1 ms, System: 8.5 ms]
  Range (min … max):   911.2 ms … 927.5 ms    10 runs

jshier@Jons-iMac isWhitespace % hyperfine './allWhitespaceAllSatisfy' 
Benchmark 1: ./allWhitespaceAllSatisfy
  Time (mean ± σ):      2.143 s ±  0.004 s    [User: 2.122 s, System: 0.019 s]
  Range (min … max):    2.139 s …  2.153 s    10 runs

jshier@Jons-iMac isWhitespace % hyperfine './noWhitespaceTrimming'  
Benchmark 1: ./noWhitespaceTrimming
  Time (mean ± σ):       6.8 ms ±   0.2 ms    [User: 5.3 ms, System: 1.0 ms]
  Range (min … max):     6.5 ms …   7.4 ms    318 runs

jshier@Jons-iMac isWhitespace % hyperfine './noWhitespaceAllSatisfy'     
Benchmark 1: ./noWhitespaceAllSatisfy
  Time (mean ± σ):       3.5 ms ±   0.1 ms    [User: 2.1 ms, System: 0.9 ms]
  Range (min … max):     3.2 ms …   4.1 ms    489 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

jshier@Jons-iMac isWhitespace % hyperfine './mixedTrimming'            
Benchmark 1: ./mixedTrimming
  Time (mean ± σ):     492.6 ms ±   3.9 ms    [User: 474.9 ms, System: 16.7 ms]
  Range (min … max):   485.7 ms … 499.4 ms    10 runs

jshier@Jons-iMac isWhitespace % hyperfine './mixedAllSatisfy' 
Benchmark 1: ./mixedAllSatisfy
  Time (mean ± σ):     574.6 ms ±   1.9 ms    [User: 567.6 ms, System: 5.9 ms]
  Range (min … max):   571.1 ms … 577.6 ms    10 runs

This isn't exhaustive, but as you can see allSatisfy only beats trimmingCharacters when the string is entirely whitespace. I can't quite explain why allSatisfy is so much slower in the longer case (perhaps checking the properties is slower than set inclusion?) but for larger strings, the more whitespace they contain the bigger the trimmingCharacters advantage. Crossover in the test above seems to be about 1,000 spaces padding 8,000 as.

You can also take a look at the rough source of trimmingCharacters and see that it operates on the underlying String buffer and subranges until it needs to materialize the result, it doesn't directly mutate or otherwise create copies until then.

In the end, depending on typical use, this may be a useful change for small strings of that usually not whitespace, but I'm not sure the difference on the small end is work the larger difference on the high end. That's up to the project.

MaxDesiatov commented 2 years ago

I wonder if benchmarks could show the same performance difference on Linux and Windows, since closed-source Foundation on Mac may behave in unexpected ways.

Also, if we could infer some typical length of strings where one method is beneficial over the other, maybe we could keep both applying them conditionally?

LucianoPAlmeida commented 2 years ago

Thank you @jshier for running benchmarks, as always perf hints can be really surprising when we measure it.

I can't quite explain why allSatisfy is so much slower in the longer case (perhaps checking the properties is slower than set inclusion?) but for larger strings, the more whitespace they contain the bigger the trimmingCharacters advantage.

Changed to operate directly on unicode view using char set, seems to close the huge gap...

lucianoalmeida@Lucianos-Mac-mini testing % hyperfine ./allWhiteSpaceAllsatisfy --prepare true 
Benchmark 1: ./allWhiteSpaceAllsatisfy
  Time (mean ± σ):     754.6 ms ±   0.2 ms    [User: 752.6 ms, System: 1.4 ms]
  Range (min … max):   754.4 ms … 754.9 ms    10 runs

lucianoalmeida@Lucianos-Mac-mini testing % hyperfine ./allWhiteSpaceTrimmed --prepare true 
Benchmark 1: ./allWhiteSpaceTrimmed
  Time (mean ± σ):     806.3 ms ±   1.1 ms    [User: 804.1 ms, System: 1.6 ms]
  Range (min … max):   805.3 ms … 808.2 ms    10 runs

lucianoalmeida@Lucianos-Mac-mini testing % hyperfine ./noWhitespacesAllsatisfy --prepare true 
Benchmark 1: ./noWhitespacesAllsatisfy
  Time (mean ± σ):      59.7 ms ± 169.7 ms    [User: 5.7 ms, System: 1.2 ms]
  Range (min … max):     3.5 ms … 542.7 ms    10 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

lucianoalmeida@Lucianos-Mac-mini testing % hyperfine ./noWhiteSpaceTrimmed --prepare true 
Benchmark 1: ./noWhiteSpaceTrimmed
  Time (mean ± σ):      23.7 ms ±  60.6 ms    [User: 6.7 ms, System: 0.8 ms]
  Range (min … max):     5.1 ms … 225.2 ms    13 runs

lucianoalmeida@Lucianos-Mac-mini testing % hyperfine ./mixedAllsatisfy --prepare true 
Benchmark 1: ./mixedAllsatisfy
  Time (mean ± σ):     215.6 ms ±  75.8 ms    [User: 194.4 ms, System: 0.9 ms]
  Range (min … max):   191.3 ms … 431.3 ms    10 runs

lucianoalmeida@Lucianos-Mac-mini testing % hyperfine ./mixedWhiteSpaceTrimmed --prepare true 
Benchmark 1: ./mixedWhiteSpaceTrimmed
  Time (mean ± σ):     441.5 ms ±  78.3 ms    [User: 414.3 ms, System: 5.5 ms]
  Range (min … max):   416.3 ms … 664.4 ms    10 runs

Then I think is just see if is worth for the project, as mixed also seems to be an improvement

MaxDesiatov commented 2 years ago

Did all of these benchmarks run in release mode? I still would love to see how it performs on Linux and Windows, but overall this is definitely something I'd consider merging 👍

LucianoPAlmeida commented 2 years ago

I wonder if benchmarks could show the same performance difference on Linux and Windows, since closed-source Foundation on Mac may behave in unexpected ways.

It should be possible since this tool work in Linux and Windows as well, but unfortunately I don't have the environment.

Did all of these benchmarks run in release mode? I still would love to see how it performs on Linux and Windows, but overall this is definitely something I'd consider merging 👍

Yeah, I just ran in with -O but could consider experimenting with -Osize if you guys think is worth it :)

MaxDesiatov commented 2 years ago

It should be possible since this tool work in Linux and Windows as well, but unfortunately I don't have the environment.

I wanted to set up some reproducible benchmarks that measure relative changes in performance between main and PR branches across all platforms. I'm looking forward to running that comparison for this PR when that's ready, but it may take some time.

Yeah, I just ran in with -O but could consider experimenting with -Osize if you guys think is worth it :)

Great, I assume -O is equivalent to swift build -c release?

LucianoPAlmeida commented 2 years ago

Great, I assume -O is equivalent to swift build -c release?

I compiled a single standalone file, so using swiftc directly, but as far as I understand swift build using release should invoke the compiler with -O

jshier commented 2 years ago

I've made some benchmarks using swift-collection-benchmark. Here's the first set of results for 100% whitespace strings up to 16M elements.

chart

As you can see, trimming has a lower floor in this case, possibly due to its raw iteration over the String's buffer instead of going through an actual Swift Iterator.

I'll work on generating cases with different combinations of whitespace.

jshier commented 2 years ago

Don't really have any other interesting results, but I've added another run that's 50% leading whitespace. Similar curve, but the allSatisfy / trimmingCharacters crossover happens later, possible due to more work materializing the output string for comparison.

chart

LucianoPAlmeida commented 2 years ago

Thank you for the benchmarks @jshier, my initial guess was exactly that avoid materialization of string for compare would be a perf win, but I guess the benchmarks show that it is not true. Have you tested cases with white space in the back, I think this would be a win for allSatisfy because it can bail early while trimming has to look both sides. And again, thanks for the benchmarks :)

jshier commented 2 years ago

Back half is the same as just having no whitespace at all: both implementations early out, so there's a fixed difference between them, just like the early part of these graphs where the setup cost of trimmingCharacters hasn't yet paid off in faster iteration.

MaxDesiatov commented 2 years ago

Thanks for the detailed charts @jshier! What tools did you use to generate them and to collect benchmark data? I was going to use https://github.com/google/swift-benchmark for CI benchmarking, but I see you probably use something fancier?

LucianoPAlmeida commented 2 years ago

Back half is the same as just having no whitespace at all: both implementations early out, so there's a fixed difference between them, just like the early part of these graphs where the setup cost of trimmingCharacters hasn't yet paid off in faster iteration.

That is unexpected, because for a string with no whitespaces in front I would expect allSafisfy to be noop, because it can bail on first char, while trimming has to loop and trim char in the back. I run the bench locally and it does show the expected results

chart

Here are the generators:

// 50%
let inputGenerator: (Int) -> String = { size in
    return String(repeating: "a", count: size / 2) + String(repeating: " ", count: size / 2)
}
// 25% 50%(non-whitespaces) 25%
let inputGenerator: (Int) -> String = { size in
    return String(repeating: " ", count: size / 4) + String(repeating: "a", count: size / 2) + String(repeating: " ", count: size / 4)
}
LucianoPAlmeida commented 2 years ago

I was going to use https://github.com/google/swift-benchmark for CI benchmarking, but I see you probably use something fancier?

@MaxDesiatov That would be https://github.com/apple/swift-collections-benchmark, it does produce the results in a json format as well, so it probably can be used on a CI setting too :)

jshier commented 2 years ago

Yes, I used swift-collection-benchmark as @LucianoPAlmeida mentioned.

@LucianoPAlmeida you're right, the backside benchmarks are more interesting as you show. In that case trimmingCharacters doesn't get to early out like allSatisfy does. I'll put together some final results but it seems likely the smaller string performance wins for allSatisfy may be the most important here, but I haven't checked where isAllWhitespace() is actually called.

jshier commented 2 years ago

Here are some final results. Unfortunately the label is a bit in the way but you can get the overall idea.

chart

jshier commented 2 years ago

Here's my bencher if you want to play with it.

isWhitespaceBencher.zip

LucianoPAlmeida commented 2 years ago

Thank you @jshier for the benchmarks, really good overview the last one! In my opinion for my understanding of the results, overall it seems like a nice improvement because where it loses the difference is almost negligible but when it can bail early the difference is quite good. Let me know your thoughts about it and if that's feels like a good change for the project :)

MaxDesiatov commented 2 years ago

LGTM, thank you both for the detailed investigation!