Optimization of is_hex() and is_hexstr()

palkeo commented 4 years ago

I don't understand why all this code was present : if it's valid hex it will be decoded successfully, there is no way otherwise.

This also makes it faster because it's only about matching a regexp now. And simpler to understand.

All tests are passing.

carver commented 3 years ago

is_hex and is_hexstr are almost identical, so this change should probably apply to both.

I'm still working on confirming that these are fully equivalent. (to make sure some untested "feature" of the old version doesn't surprise us)

carver commented 3 years ago

Ok, seems good enough to me. I ran this against some hypothesis tests on master, both against some with random strings, and some seeded to be "near" valid regex strings. All passed (ran about 30k examples on each).

Simplest example:

import re

from hypothesis import (
    given,
    settings,
    strategies as st,
)

from eth_utils import is_hexstr

_HEX_REGEXP = re.compile("(0x)?[0-9a-f]*", re.IGNORECASE | re.ASCII)

def regexp_is_hexstr(input_str):
    if not isinstance(input_str, (str,)) or not input_str:
        return False
    return _HEX_REGEXP.fullmatch(input_str) is not None

@given(st.text())
@settings(max_examples=10000) 
def test_is_hexstr_regexp_equivalence(input_text):
    original_result = is_hexstr(input_text)
    regexp_result = regexp_is_hexstr(input_text)
    assert original_result == regexp_result

carver commented 3 years ago

@palkeo are you interested in making this change to is_hex too?

palkeo commented 3 years ago

Is just added the change to is_hex as well, I don't know why I didn't do it sooner.

Thanks for the hypothesis tests, that kind of change sounds like the perfect use case :)

carver commented 3 years ago

Thanks!

ethereum / eth-utils

Optimization of is_hex() and is_hexstr() #202