Open josharian opened 4 years ago
cc @mundaym who has been thinking about SSA rules to combine branches
Tricky one. This transformation could slow down/bloat some code (generally when uint16(first two bytes)==uint16("ab")
is unlikely), so while I think the backend could do it I'm not sure it should unless it has likeliness information.
Note that for longer strings we could overlap the loads given they are unaligned anyway (not sure if we do this already - I don't think we do from what I see in walk):
x == "abcdefg" -> x[0:4] == "abcd" && x[3:7] == "defg"
I think that transformation could be applied greedily since it doesn't add any extra instructions to the 'not taken' path. It could also be applied in any situation where we want to check the last 3 characters in a string and we also know the length of the string is greater than 3 (i.e. we can access index -1 in the string and be in bounds). I suspect this crops up occasionally in string switch statement lowering. The overlapping characters could be masked out even when known to avoid double-load related issues in racy code. That said, this would all be quite hard in SSA, but maybe string comparison lowering could do it.
Overlapping loads is a good idea, thanks! Agreed we should do it during walk.
I just remembered (vaguely) that overlapping loads caused performance problems a few years ago because they interfered with load-store forwarding.
I just remembered (vaguely) that overlapping loads caused performance problems a few years ago because they interfered with load-store forwarding.
Yeah, this might be a problem. Though since Go programs don't tend to copy strings by value that much we might not see load-hit-store hazards very often. My guess would be that strings are rarely still in the store buffer when we are comparing them, but that is just a guess.
This all also applies to comparison of small arrays of ints; see walkcompare.
To compare against "abc", we currently emit code to test whether uint16(first two bytes)==uint16("ab") and then whether (third byte)=="c".
I suspect that it'd be more efficient to load the three bytes into the first three bytes of a uint32 (two loads combined with
|
) and then compare that to uint32("abc\0").I'm not sure whether we could do this with comparison-combining rewrite rules (note that the loads have side-effects in general, but we're sure in this case they won't fault) or whether it'd be better to fix this in walk.go, where this optimization is generated.
Low priority.