RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Convert old string substitutions to f-strings in term.py #2864

Closed ashleysommer closed 2 months ago

ashleysommer commented 2 months ago

I thought this work had already been done around 18 months ago. Or maybe there was and old PR that does it? I'm not sure.

As-of Python 3.8, f-strings are feature-comparable with str.format(), and have even eventually surpassed the raw performance of old-style "%-substitution" strings from the Python2 days.

This PR does the follow:

Some micro-benchmarking to show the kind of differences:

import timeit
# joining two string variables
t0 =timeit.timeit('a+b', setup='a,b = "<test>", "abcd13"*1000')
print(t0)
t1 =timeit.timeit('"%s%s" % (a, b)', setup='a,b = "<test>", "abcd13"*1000')
print(t1)
t2 = timeit.timeit('f"{a}{b}"', setup='a,b = "<test>", "abcd13"*1000')
print(t2)
t3 = timeit.timeit('f"<test>{b}"', setup='b = "abcd13"*1000')
print(t3)
# joining three strings
t0 =timeit.timeit('a+b+c', setup='a,b,c = "<test>", "abcd13"*1000, "</test>"')
print(t0)
t1 =timeit.timeit('"%s%s%s" % (a, b, c)', setup='a,b,c = "<test>", "abcd13"*1000, "<test>"')
print(t1)
t2 = timeit.timeit('f"{a}{b}{c}"', setup='a,b,c = "<test>", "abcd13"*1000, "</test>"')
print(t2)
# Wrapping a variable with prefix and suffix
t3 = timeit.timeit('f"<test>{b}</test>"', setup='b = "abcd13"*1000')
print(t3)

results

# two string variables
0.08020864403806627 # Concat with + is fastest for 2 (and only 2) values
0.08790699497330934
0.08184163994155824
0.08165007410570979
# joining three strings
0.15257811499759555
0.09559847600758076
0.08691176993306726 # f-string concat is fastest for unknown 3 variables.
0.08586340001784265 # Inline f-string substitution is fastest for wrapped variables.
ashleysommer commented 2 months ago

Additionally, one private internal function is removed from term.py, that is _serial_number_generator() for the BNode constructor that was used as a fallback on for when the user doesn't pass their own Generator. Now we simply call uuid4().hex instead of using this custom generator function.

Removing this speeds up creation of BNodes (especially when many BNodes need to be created) because it removes one layer of indirection and one additional python method call.

The ability to pass your own generator function if you need a different sn_gen on BNode is still available.

ashleysommer commented 2 months ago

Notice there is an odd behaviour in the tests. The previous version had the assertion that Literal Substitution of Literal(Decimal(1.2121214312312)) - Literal(Decimal(1.0)) equals Literal(Decimal(0.212121)), that seems wrong because it loses 8 decimal points of precision. After these string formatting changes, in term.py the tests needed updating so the result is Literal(Decimal(0.2121214312312)). That must be a difference in the default precision of float-formatting between %-substitution and f-string.

ashleysommer commented 2 months ago

One last change. Fixed a very old bug where a Generator Function (a yield function) didn't work as a BNode prefix generator, even though it is documented to work and there are tests for it. This was relieved by a newly failing test after the change to better string concatenation in the BNode constructor. The old code was concealing this by serialzing the generator fn itself, instead of the output of the generator.

coveralls commented 2 months ago

Coverage Status

coverage: 90.627% (+0.001%) from 90.626% when pulling 56ccb7a7b731326a8c78283fc64f548be760b2c7 on fstrings_term into cb2c8d1e90b4b582b69ac5c210600a25f19a60ac on main.