Closed exiledagain closed 7 months ago
What was being measured for the 1.4x-2.6x speed-up? (i.e. what specifically is faster now. fps, game load times, etc.)
Worst case condition processing:
for (vector<Rule*>::const_iterator it = RuleList.begin(); it != RuleList.end(); it++)
{
(*it)->Evaluate(uInfo);
}
method | filter | min | max | avg (ms) |
---|---|---|---|---|
rpn | eqn | 0.5494 | 2.965 | 0.869985 |
tree | eqn | 0.1741 | 1.6262 | 0.451288 |
rpn | kryszard | 0.1551 | 0.7429 | 0.218016 |
tree | kryszard | 0.0599 | 0.7245 | 0.153167 |
rpn | hiim | 0.3951 | 1.7606 | 0.55611 |
tree | hiim | 0.116 | 0.9139 | 0.260857 |
rpn | kassahi | 0.7442 | 2.4542 | 1.129761 |
tree | kassahi | 0.1625 | 1.2869 | 0.423817 |
Very nice. I like what I'm seeing. I guess I assumed that more rules (and more conditions) == worse performance
but it's nice to see some actual values to back that up.
I'd be curious to see what Pilllaa's hype filter looks like before and after as well https://github.com/PiLLLaa/Pd2/blob/main/S8_Hype.filter. And have you told Kass how slow his filter is yet? 😆
I did a rough "fps" check locally using eqN's filter when hovering over an item. Not the most accurate data by any means (fps spikes around like crazy) but still nice to see that it should transfer over to actual game performance as well.
method | filter | min | max | avg (ms) |
---|---|---|---|---|
rpn | PiLLLaahype | 0.5073 | 2.2207 | 0.930571 |
tree | PiLLLaahype | 0.2441 | 1.4333 | 0.589493 |
Many small rules instead of the more complex ones I've seen from other filters. This was the worst min gain.
Oh interesting. I wouldnt have expected eqn to be so close to pilllaa's. Thanks for doing that. Can you also share the code you're using to profile this stuff? I think Kanzeon's SubstituteNameVariables
PR may give a small performance bump but it'd be nice to measure it the same way you did, so the results can be compared.
Also, as far as patch notes credit, should we add your github name or your discord name? (I tried to DM you but it looks like you have DMs disabled)
The benchmark code is rather crude:
{
static long long min = 100000000;
static long long max = 0;
static long long sum = 0;
static long long count = 0;
count += 1;
auto S = std::chrono::high_resolution_clock::now();
for (vector<Rule*>::const_iterator it = RuleList.begin(); it != RuleList.end(); it++)
{
(*it)->Evaluate(uInfo);
}
auto E = std::chrono::high_resolution_clock::now();
long long dur = (E - S).count();
min = min > dur ? dur : min;
max = max < dur ? dur : max;
sum += dur;
FILE* f = fopen("debug_eval_rpn.txt", "wb");
if (f) {
fprintf(f, "%lld %lld %lld\n", min, max, sum / count);
fclose(f);
}
}
Literally just ran a juiced Throne map and moused over a few tabs in BetweenWall's stash.
My GitHub username is preferred. Thanks.
When I benchmarked the replacement part it was only peaking at 0.25ms with PiLLLaa's filter.
Understood. So it sounds like the bulk of the time is spent inside of the individual filter condition functions (which makes sense), and any performance gain during name replacement is going to be pretty minimal.
Thanks again for digging into this and thanks for the patch!
This is roughly a 1.4-2.6x speed-up. Filters tested:
Added a circuit type to the base condition as
dynamic_cast
caused higher peak evaluation times.Note: the old RPN evaluation had a bug where the first and second arguments were reversed. Reversing these parameters increased performance.