Project-Diablo-2 / BH

A modified version of slashdiablo's BH for Project Diablo 2
GNU Affero General Public License v3.0
24 stars 20 forks source link

Replace RPN evaluation with tree #45

Closed exiledagain closed 3 months ago

exiledagain commented 3 months ago

This is roughly a 1.4-2.6x speed-up. Filters tested:

Added a circuit type to the base condition as dynamic_cast caused higher peak evaluation times.

Note: the old RPN evaluation had a bug where the first and second arguments were reversed. Reversing these parameters increased performance.

fergushev commented 3 months ago

What was being measured for the 1.4x-2.6x speed-up? (i.e. what specifically is faster now. fps, game load times, etc.)

exiledagain commented 3 months ago

Worst case condition processing:

for (vector<Rule*>::const_iterator it = RuleList.begin(); it != RuleList.end(); it++)
{
    (*it)->Evaluate(uInfo);
}
method filter min max avg (ms)
rpn eqn 0.5494 2.965 0.869985
tree eqn 0.1741 1.6262 0.451288
rpn kryszard 0.1551 0.7429 0.218016
tree kryszard 0.0599 0.7245 0.153167
rpn hiim 0.3951 1.7606 0.55611
tree hiim 0.116 0.9139 0.260857
rpn kassahi 0.7442 2.4542 1.129761
tree kassahi 0.1625 1.2869 0.423817
fergushev commented 3 months ago

Very nice. I like what I'm seeing. I guess I assumed that more rules (and more conditions) == worse performance but it's nice to see some actual values to back that up.

I'd be curious to see what Pilllaa's hype filter looks like before and after as well https://github.com/PiLLLaa/Pd2/blob/main/S8_Hype.filter. And have you told Kass how slow his filter is yet? 😆

I did a rough "fps" check locally using eqN's filter when hovering over an item. Not the most accurate data by any means (fps spikes around like crazy) but still nice to see that it should transfer over to actual game performance as well.

exiledagain commented 3 months ago
method filter min max avg (ms)
rpn PiLLLaahype 0.5073 2.2207 0.930571
tree PiLLLaahype 0.2441 1.4333 0.589493

Many small rules instead of the more complex ones I've seen from other filters. This was the worst min gain.

fergushev commented 3 months ago

Oh interesting. I wouldnt have expected eqn to be so close to pilllaa's. Thanks for doing that. Can you also share the code you're using to profile this stuff? I think Kanzeon's SubstituteNameVariables PR may give a small performance bump but it'd be nice to measure it the same way you did, so the results can be compared. Also, as far as patch notes credit, should we add your github name or your discord name? (I tried to DM you but it looks like you have DMs disabled)

exiledagain commented 3 months ago

The benchmark code is rather crude:

{
    static long long min = 100000000;
    static long long max = 0;
    static long long sum = 0;
    static long long count = 0;
    count += 1;
    auto S = std::chrono::high_resolution_clock::now();
    for (vector<Rule*>::const_iterator it = RuleList.begin(); it != RuleList.end(); it++)
    {
        (*it)->Evaluate(uInfo);
    }
    auto E = std::chrono::high_resolution_clock::now();
    long long dur = (E - S).count();
    min = min > dur ? dur : min;
    max = max < dur ? dur : max;
    sum += dur;
    FILE* f = fopen("debug_eval_rpn.txt", "wb");
    if (f) {
        fprintf(f, "%lld %lld %lld\n", min, max, sum / count);
        fclose(f);
    }
}

Literally just ran a juiced Throne map and moused over a few tabs in BetweenWall's stash.

My GitHub username is preferred. Thanks.

exiledagain commented 3 months ago

When I benchmarked the replacement part it was only peaking at 0.25ms with PiLLLaa's filter.

fergushev commented 3 months ago

Understood. So it sounds like the bulk of the time is spent inside of the individual filter condition functions (which makes sense), and any performance gain during name replacement is going to be pretty minimal.

Thanks again for digging into this and thanks for the patch!