faster-cpython / ideas

1.68k stars 48 forks source link

O(1) string comparisons #486

Open Fidget-Spinner opened 1 year ago

Fidget-Spinner commented 1 year ago

Random thought I had:

Many strings in CPython are hashed, and their hashes are stored in the string object iself so they don't have to re-computed.

We can make use of those hashes to compare strings.

So s1 == s2 will fail in O(1) time if their hashes are not equal. Else even if their hashes are equal we have to do a full comparison in case of hash collision.

This shouldn't require much more code, but it also shouldn't move pyperformance. I just thought it would be interesting. Thoughts anyone?

corona10 commented 1 year ago

Similar discussion?: https://bugs.python.org/issue30093

Fidget-Spinner commented 1 year ago

Yep! Thanks for digging that up. I'll take a look at the history of the idea.

gvanrossum commented 1 year ago

I always thought we were already doing this. Oh well, TIL.

On Thu, Oct 27, 2022 at 7:46 AM Ken Jin @.***> wrote:

Yep! Thanks for digging that up. I'll take a look at the history of the idea.

— Reply to this email directly, view it on GitHub https://github.com/faster-cpython/ideas/issues/486#issuecomment-1293637218, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWCWMTBV6NCHUVGAOOMEDLWFKIU5ANCNFSM6AAAAAARQCLVXM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- --Guido van Rossum (python.org/~guido)

sweeneyde commented 1 year ago

It seems the objection in the past was that it adds branches to dict lookups, where the hash equality has already been guaranteed.

But dicts use the function unicode_eq, whereas something like the COMPARE_OP_STR_JUMP opcode uses unicode_compare_eq.

So maybe it would be appropriate to add hash checks to unicode_compare_eq but not unicode_eq.

methane commented 1 year ago

I think this is not worth enough. It adds some branches, although this is useful only in very limited cases.

oraluben commented 1 year ago

There's already similar pattern in dict lookup: https://github.com/python/cpython/blob/fbcafa6eeeb106c8d9e9bb4c3b44e34e070f79c9/Objects/dictobject.c#L928