Open stephentoub opened 1 year ago
Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Runtime`, `untriaged`, `needs-area-label` |
Milestone: | - |
I'm thinking of something like this: https://github.com/dotnet/runtime/compare/main...stephentoub:runtime:consolidateunrolling
@jkotas, @tannergooding, @MihaZupan, @EgorBo, if perf tests end up looking ok, any concerns with doubling-down on generics like this even further?
If we go ahead with this, we could do something similar with many of the Vector128/Vector256 IndexOfXx code paths as well.
I like the fact the proposed change removes more code than it adds, but I probably need more coffee to start understanding what it does 😄
Basically we have a bunch of places implementing various IndexOf variations, some of which unroll manually, some of which don't, and when they do unroll there are often subtle differences to how it's done for little apparent reason. This is an attempt to make it consistent, with shared boilerplate they all use and just plug in the single equality check that differs between them.
(Of course, if there were a way for the JIT to just do similar unrolling automatically, we could instead delete all of this.)
Thanks, I understand the intention, I meant the actual implementation
(Of course, if there were a way for the JIT to just do similar unrolling automatically, we could instead delete all of this.)
if perf tests end up looking ok, any concerns with doubling-down on generics like this even further?
What does perf tests looking ok mean? I expect that this is going to regress Mono performance (with interpreter in particular), reference type instantiations performance (limitations around shared generic code optimizations), and static footprint (more generic types instantiated for each T). The question is then going to be whether these regressions are worth taking to eliminate the code duplication.
Our IndexOf{Any} (and some LastIndexOf{Any}) implementations all have a scalar path that's used when vectorization can't be, either because the current platform doesn't support it, the target type doesn't support it, or the length being searched is too small. Manual of these manually unroll the loop, e.g.
And as a result, we have a bunch of copies of code of that general structure.
We should be able to deduplicate many of them by using generic specialization, e.g. we create an interface like:
then write a shared routine for doing the operation based on this kernel:
and then in the places we need it, creating a simple struct that provides its core operation as the implementation of Equal, e.g.
or something along those lines.
It would help us to deduplicate a bunch of code and also ensure we're consistently unrolling in the same manner across all of our various implementations.
We'd need to measure to understand the cost of this. There might be better ways to structure it as well, e.g. some operations could get away with static abstract interface methods, but others do need to carry around the state (e.g. for multiple values to compare against).
(Alternatively, maybe we could better teach the JIT to do this level of unrolling and eliminate it entirely from the code?)