Open romshark opened 3 weeks ago
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
I don't see any API change here, so taking this out of the proposal process.
You've gotten performance improvements but I don't see any description of the changes you made to get those performance improvements (unless the changes are all in that screenshot?).
It would be nice to post a CL, together with some Benchmarks that demonstrate the improvements (if those benchmarks are not already present in the utf8 package).
@randall77 See the linked repo, which contains the patched code and the benchmark results. The diff is hard to see (it's not even in the history of the repo), but it really is just this:
@@ -473,16 +473,22 @@ func Valid(p []byte) bool {
return false // Short or invalid.
}
accept := acceptRanges[x>>4]
+ // The change is here. Instead of reusing `size` constants 2, 3 and 4
+ // are used which leads to performance improvement of up to 41%.
if c := p[i+1]; c < accept.lo || accept.hi < c {
return false
} else if size == 2 {
+ i += 2
} else if c := p[i+2]; c < locb || hicb < c {
return false
} else if size == 3 {
+ i += 3
} else if c := p[i+3]; c < locb || hicb < c {
return false
+ } else {
+ i += 4
}
- i += size
+
}
return true
}
@@ -519,16 +525,21 @@ func ValidString(s string) bool {
return false // Short or invalid.
}
accept := acceptRanges[x>>4]
+ // The change is here. Instead of reusing `size` constants 2, 3 and 4
+ // are used which leads to performance improvement of up to 41%.
if c := s[i+1]; c < accept.lo || accept.hi < c {
return false
} else if size == 2 {
+ i += 2
} else if c := s[i+2]; c < locb || hicb < c {
return false
} else if size == 3 {
+ i += 3
} else if c := s[i+3]; c < locb || hicb < c {
return false
+ } else {
+ i += 4
}
- i += size
}
return true
}
I guess the speedup comes from https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/unicode/utf8/utf8.go;l=508
size := int(x & 7)
and the compiler can't know that 5,6,7 are never supposed to happen (vs hardcoding += 4) (thanks to how first[] is constructed)
nor 1
because of
if si < RuneSelf {
I think you should make a CL but also add comments to explain why it works / why +4 is the only possible case in the else
also I wonder if you could benchmark that vs a switch (which would be more readable and equivalent I think)
It might be related to #47120 but I haven't had the opportunity to dig into the details of Russ' proposal. At first glance it looks like Russ tackled exactly that problem and my proposal may just be a duplicate.
DFA implementation from #47120 looks much simpler than existing one. It would be interesting to see how this change benchmark results compare to DFA results.
I'm not sure I understand how that change makes things any faster, but if there are benchmarks that demonstrate it I'm ok with it.
An aside, feel free to ignore...
size
is already needed in a register for the conditional, so it isn't clear to me why += const is any faster than += register. Both are identical speedwise on all chips I've seen. Maybe because we have to merge from each case back to a single += instruction before jumping to the top of the loop, instead of jumping to the top of the loop directly? It might be worth trying += size
in each if
block instead of after, just to see what happens. Not that the final code should do that, but just to tease out why there is a speed improvement.
Proposal Details
I've been experimenting with different kinds of optimization techniques in order to speed up utf8.Valid and utf8.ValidString and I'd like to share my findings that appear to be low-hanging fruit, at least on ARM64:
The change is minimal but yields performance improvements of up to 41% on Apple M1 Max for all-UTF8 inputs while not showing any significant negative side-effects. Though on AMD Ryzen 7 5700x the geomean is significantly lower at just -1.43%, with some benchmarks even showing significant performance degradation ⚠️.
The test setup with benchmarks can be found here: https://github.com/romshark/utf8valid
Relation to Other Proposals
It might be related to https://github.com/golang/go/issues/47120 but I haven't had the opportunity to dig into the details of Russ' proposal. At first glance it looks like Russ tackled exactly that problem and my proposal may just be a duplicate.
Compared to https://github.com/golang/go/issues/63347 this proposal doesn't require changes of significant complexity, it's a 9-line code change.