10x speedup for quote_name implementation

nrdvana commented 6 months ago

The current implementation of quote_name calls is_keyword, which runs a linear scan of every postgres keyword looking for a match. Since most names passed to quote_name will not be a keyword, this is likely to scan the full list every time.

A much more performant strategy is to make a single character comparison that divides the list in half repeatedly until there is only one possible option, then run a single strcmp() on that.

This executes 10x-15x as fast, and also compiles into smaller machine code (despite looking like a lot more instructions) because it has only one function call, and a function call takes more instructions than a single-character comparison + jump.

Benchmarks

It's difficult to compare two versions of an XS module, so I copied quote.c as quote2.c and added some XS entrypoints so that I could call both versions of the function, and without a database connection. I was then able to run Benchmark code of

use v5.36;
use DBD::Pg v3.18.0;
use Benchmark 'cmpthese';

cmpthese(10000000, {
    quote_name => sub {
        DBD::Pg::db::quote_name1("test");
    },
    quote_name2 => sub {
        DBD::Pg::db::quote_name2("test");
    },
});

giving

                  Rate  quote_name quote_name2
quote_name   1113586/s          --        -94%
quote_name2 18181818/s       1533%          --

and a slightly more realistic example

use v5.36;
use DBD::Pg v3.18.0;
use Benchmark 'cmpthese';

cmpthese(5000000, {
    quote_name => sub {
        DBD::Pg::db::quote_name1("some_table_name");
        DBD::Pg::db::quote_name1("colname_one");
        DBD::Pg::db::quote_name1("colname_two");
    },
    quote_name2 => sub {
        DBD::Pg::db::quote_name2("some_table_name");
        DBD::Pg::db::quote_name2("colname_one");
        DBD::Pg::db::quote_name2("colname_two");
    },
});

giving

                 Rate  quote_name quote_name2
quote_name   368189/s          --        -92%
quote_name2 4854369/s       1218%          --

Binary Size

While the generated code might appear to be much larger, it actually compiles smaller:

$ objdump -t quote.o | grep is_keyword
0000000000000ad0 g     F .text  000000000000396c is_keyword
$ objdump -t quote2.o | grep is_keyword
0000000000000000 g     F .text  0000000000002119 is_keyword2

XS Entrypoints used for benchmarking:

SV*
quote_name1(str)
    SV *str
    INIT:
        char *quoted;
        const char *to_quote;
        STRLEN retlen=0;
        STRLEN len=0;
    CODE:
        to_quote = SvPV(str, len);
        quoted = quote_name(aTHX_ to_quote, len, &retlen, 1);
        RETVAL = newSVpvn(quoted, retlen);
        Safefree (quoted);
    OUTPUT:
        RETVAL

SV*
quote_name2(str)
    SV *str
    INIT:
        char *quoted;
        const char *to_quote;
        STRLEN retlen=0;
        STRLEN len=0;
    CODE:
        to_quote = SvPV(str, len);
        quoted = quote_name2(aTHX_ to_quote, len, &retlen, 1);
        RETVAL = newSVpvn(quoted, retlen);
        Safefree (quoted);
    OUTPUT:
        RETVAL

esabol commented 6 months ago

Amazing!

jonjensen commented 6 months ago

What an awesome improvement! Amazing how much overhead function calls have. I look forward to trying this out.

nrdvana commented 6 months ago

@jonjensen Well, it's a dramatic percent increase, but in the grand scheme of things, in a controller method that calls DBIC and might make several hundred calls to quote_name in a request, you were only losing about 1ms to the implementation of quote_name. But why waste 1ms if you don't have to? The binary search code was actually from a different project of mine, and I was able to integrate it in about 2.5 hours.

To be clear, this was more of an opportunistic optimization rather than an identified bottleneck.

karenetheridge commented 6 months ago

Did you generate this tree by hand? I worry that it's difficult to verify that every branch is correct and that every word is represented -- so I wonder if we could use a pure-perl script to generate the XS code at built time using the list of words as an input?

jonjensen commented 6 months ago

Sure, I get that. It's just fun to see that the binary search not only was faster, but took less code space. That's surprising.

Using a hash might be nicer, I suppose, if DBD::Pg'x XS code can use a Perl hash. That would avoid the code generation step.

jonjensen commented 6 months ago

Did you generate this tree by hand? I worry that it's difficult to verify that every branch is correct and that every word is represented -- so I wonder if we could use a pure-perl script to generate the XS code at built time using the list of words as an input?

Yes, it's autogenerated. That Perl code is in the commit.

nrdvana commented 6 months ago

@karenetheridge it's at the bottom of the same file

Also, had that same worry and added the unit test

nrdvana commented 6 months ago

@jonjensen It's theoretically faster than a hash. The hash starts with a hash function requiring O(strlen) operations, where the selective character comparisons are O(log(wordcount)). Then the hash lookup is O(1) and can be ignored, then this one finishes with guaranteed single strcmp where a hash table needs to run a strcmp for each bucket collision, which is likely also 1 so they tie.

It's not hard to use perl's hashtable functions, but then the table needs built at runtime.

jonjensen commented 6 months ago

Oh, that's true, and building the hash table at runtime is an annoying and avoidable startup cost.

nrdvana commented 6 months ago

@jonjensen Actually, with 430 keywords, log(wordcount) is up to about 9, so it's getting close to even with a hash function. But you might be interested in my perl script to generate perfect hashtables for the thousand+ OpenGL constants recognized by CmdlineGL. I wrote that back before I knew about the single-character-binary-search idea.

karenetheridge commented 6 months ago

Yes, it's autogenerated. That Perl code is in the commit.

Sorry I missed that -- thanks!

esabol commented 6 months ago

CI tests are failing though. Any idea why?

nrdvana commented 6 months ago

@esabol I apparently didn't run the author tests. Looks like one is a false-positive that the 'like' pg keyword is confused for a Test::More verb, and the second is that... I used "notakeyword" instead of 'notakeyword'? Am I understanding that right?

The third is a "typo" from the TODO file, unrelated to my changes.

esabol commented 6 months ago

@nrdvana wrote:

Looks like one is a false-positive that the 'like' pg keyword is confused for a Test::More verb [...]

Hmmm, how annoying. Can you work around somehow?

I used "notakeyword" instead of 'notakeyword'? Am I understanding that right?

Yeah, I think so. A simple change, I hope.

https://metacpan.org/pod/Perl::Critic::Policy::ValuesAndExpressions::ProhibitInterpolationOfLiterals

The third is a "typo" from the TODO file, unrelated to my changes.

You're right that it's not related to your changes. But, for the sake of expediency in merging the PR, could you add the flagged word from the TODO file to the list of acceptable words in t/99-spellcheck.t?

nrdvana commented 6 months ago

Ok, ready for another run. It's not a very robust fix for 99_lint.t, but it wasn't that robust to begin with ;-)

karenetheridge commented 6 months ago

Looks like one is a false-positive that the 'like' pg keyword is confused for a Test::More verb [...]

Hmmm, how annoying. Can you work around somehow?

You can import just the symbols from Test::More you need, or don't import anything and use the fully-qualified name everywhere:

i.e.

use Test::More qw(ok is);

# or..
use Test::More ();
...
Test::More::is(...);

esabol commented 6 months ago

@nrdvana wrote:

Ok, ready for another run.

Looks great! Thank you!

bucardo / dbdpg