Replace RPN evaluation with tree

Project-Diablo-2 / BH

A modified version of slashdiablo's BH for Project Diablo 2

GNU Affero General Public License v3.0

24 stars 25 forks source link

Replace RPN evaluation with tree #45

Closed exiledagain closed 7 months ago

exiledagain commented 7 months ago

This is roughly a 1.4-2.6x speed-up. Filters tested:

eqN
Hiim
Kassahi
Kryszard

Added a circuit type to the base condition as dynamic_cast caused higher peak evaluation times.

Note: the old RPN evaluation had a bug where the first and second arguments were reversed. Reversing these parameters increased performance.

fergushev commented 7 months ago

What was being measured for the 1.4x-2.6x speed-up? (i.e. what specifically is faster now. fps, game load times, etc.)

exiledagain commented 7 months ago

Worst case condition processing:

for (vector<Rule*>::const_iterator it = RuleList.begin(); it != RuleList.end(); it++)
{
    (*it)->Evaluate(uInfo);
}

method	filter	min	max	avg (ms)
rpn	eqn	0.5494	2.965	0.869985
tree	eqn	0.1741	1.6262	0.451288
rpn	kryszard	0.1551	0.7429	0.218016
tree	kryszard	0.0599	0.7245	0.153167
rpn	hiim	0.3951	1.7606	0.55611
tree	hiim	0.116	0.9139	0.260857
rpn	kassahi	0.7442	2.4542	1.129761
tree	kassahi	0.1625	1.2869	0.423817

fergushev commented 7 months ago

Very nice. I like what I'm seeing. I guess I assumed that more rules (and more conditions) == worse performance but it's nice to see some actual values to back that up.

I'd be curious to see what Pilllaa's hype filter looks like before and after as well https://github.com/PiLLLaa/Pd2/blob/main/S8_Hype.filter. And have you told Kass how slow his filter is yet? 😆

I did a rough "fps" check locally using eqN's filter when hovering over an item. Not the most accurate data by any means (fps spikes around like crazy) but still nice to see that it should transfer over to actual game performance as well.

Without your change: ~1700fps
With your change: ~2000fps

exiledagain commented 7 months ago

method	filter	min	max	avg (ms)
rpn	PiLLLaahype	0.5073	2.2207	0.930571
tree	PiLLLaahype	0.2441	1.4333	0.589493

Many small rules instead of the more complex ones I've seen from other filters. This was the worst min gain.

fergushev commented 7 months ago

Oh interesting. I wouldnt have expected eqn to be so close to pilllaa's. Thanks for doing that. Can you also share the code you're using to profile this stuff? I think Kanzeon's SubstituteNameVariables PR may give a small performance bump but it'd be nice to measure it the same way you did, so the results can be compared. Also, as far as patch notes credit, should we add your github name or your discord name? (I tried to DM you but it looks like you have DMs disabled)

exiledagain commented 7 months ago

The benchmark code is rather crude:

{
    static long long min = 100000000;
    static long long max = 0;
    static long long sum = 0;
    static long long count = 0;
    count += 1;
    auto S = std::chrono::high_resolution_clock::now();
    for (vector<Rule*>::const_iterator it = RuleList.begin(); it != RuleList.end(); it++)
    {
        (*it)->Evaluate(uInfo);
    }
    auto E = std::chrono::high_resolution_clock::now();
    long long dur = (E - S).count();
    min = min > dur ? dur : min;
    max = max < dur ? dur : max;
    sum += dur;
    FILE* f = fopen("debug_eval_rpn.txt", "wb");
    if (f) {
        fprintf(f, "%lld %lld %lld\n", min, max, sum / count);
        fclose(f);
    }
}

Literally just ran a juiced Throne map and moused over a few tabs in BetweenWall's stash.

My GitHub username is preferred. Thanks.

exiledagain commented 7 months ago

When I benchmarked the replacement part it was only peaking at 0.25ms with PiLLLaa's filter.

fergushev commented 7 months ago

Understood. So it sounds like the bulk of the time is spent inside of the individual filter condition functions (which makes sense), and any performance gain during name replacement is going to be pretty minimal.

Thanks again for digging into this and thanks for the patch!