Closed paulmillr closed 1 month ago
Let me do some benchmarking of our version vs noble. Definitely open to it if it proves out that yours is faster (since we do boatloads of hex <-> bytes conversions. I think the regex in ours is just to check for non hex characters so could potentially be removed if it was the sole source of drag.
Victory - noble
I added a second test for bytes2Hex
.
Victory - noble
Here's the benchmark script (using the vitest benchmark feature) if anybody else wants to run it for comparison.
import { bytesToHex as nobleB2H, hexToBytes as nobleH2B } from '@noble/curves/abstract/utils'
import { bench, describe } from 'vitest'
import { bytesToHex, randomBytes, unprefixedHexToBytes } from '../src/bytes.js'
describe('h2b benchmarks', () => {
bench('noble', () => {
nobleH2B('0123456789abcdef')
})
bench('ethjs', () => {
unprefixedHexToBytes('0123456789abcdef')
})
})
describe('b2h benchmarks', () => {
const bytes = randomBytes(32)
bench('noble', () => {
nobleB2H(bytes)
})
bench('ethjs', () => {
bytesToHex(bytes)
})
})
I will say that we do intentionally duplicate the bytes conversion utilities inside our rlp
package because we want that to be an entirely standalone package with no dependencies. Maybe we could borrow your variants for there so they're faster.
Beyond that, we already import ethereum-cryptography
in util
(where our other bytes2Hex and hex2Bytes functions live) so we've already implicitly got noble/curves
there so easy to add it as a direct dependency I guess.
@holgerd77 What do you think? We'd get looks like a 1.5-2x speed improvement using the noble versions of these functions which could be hugely beneficial given the variety of places we do these bytes/hex conversions.
@paulmillr I looked a little further and it looks like ethereum-cryptography
already re-exports the bytesToHex
/hexToBytes
functions from @noble/hashes
. I'm assuming this is the same version of the utilites as in noble/curves
? If so, I think we can probably safely just re-export them for use in our own libraries.
The exported util from jsec is not the same. It allows 0x, but it doesn’t validate for 0x presence.
So there will need to be added a direct noble reexport.
Just dropping this here, we added regex check to be sure that the hex bytes were properly formatted, see: https://github.com/ethereumjs/ethereumjs-monorepo/pull/3185
We can add this for optimization, but then only in areas where we are 100% sure that the string is correctly formatted.
Oh, interesting, it looks like the noble method actually does the hex check inside the method. In that case we should likely change!
@acolytec3 I agree that rlp
should duplicate h2b code if the goal is no-unnecessary-deps
Maybe just copy-paste this code until the next release of eth-crypto (which would be in october)
// Needs `isBytes()` function
// We use optimized technique to convert hex string to byte array
const asciis = { _0: 48, _9: 57, _A: 65, _F: 70, _a: 97, _f: 102 } as const;
function asciiToBase16(char: number): number | undefined {
if (char >= asciis._0 && char <= asciis._9) return char - asciis._0;
if (char >= asciis._A && char <= asciis._F) return char - (asciis._A - 10);
if (char >= asciis._a && char <= asciis._f) return char - (asciis._a - 10);
return;
}
/**
* @example hexToBytesUnprefixed('cafe0123') // Uint8Array.from([0xca, 0xfe, 0x01, 0x23])
*/
export function hexToBytesUnprefixed(hex: string): Uint8Array {
if (typeof hex !== 'string') throw new Error('hex string expected, got ' + typeof hex);
const hl = hex.length;
const al = hl / 2;
if (hl % 2) throw new Error('padded hex string expected, got unpadded hex of length ' + hl);
const array = new Uint8Array(al);
for (let ai = 0, hi = 0; ai < al; ai++, hi += 2) {
const n1 = asciiToBase16(hex.charCodeAt(hi));
const n2 = asciiToBase16(hex.charCodeAt(hi + 1));
if (n1 === undefined || n2 === undefined) {
const char = hex[hi] + hex[hi + 1];
throw new Error('hex string expected, got non-hex character "' + char + '" at index ' + hi);
}
array[ai] = n1 * 16 + n2;
}
return array;
}
// Array where index 0xf0 (240) is mapped to string 'f0'
const hexes = /* @__PURE__ */ Array.from({ length: 256 }, (_, i) =>
i.toString(16).padStart(2, '0')
);
/**
* @example bytesToHex(Uint8Array.from([0xca, 0xfe, 0x01, 0x23])) // 'cafe0123'
*/
export function bytesToHex(bytes: Uint8Array): string {
if (!isBytes(bytes)) throw new Error('bytes expected');
// pre-caching improves the speed 6x
let hex = '';
for (let i = 0; i < bytes.length; i++) {
hex += hexes[bytes[i]];
}
return hex;
}
Ok, yeah, these are impressive (benchmark) results! 🤩
I would be strongly in favor that we use Paul's code here, but I would also really like to stay in our own convention with always-prefixed-hex, we put so much effort into that to align this.
So I would (also) suggest that we use the versions from ethereum-cryptography
/@noble/curves
in Util but then wrap this in our own methods and re-export, so that we can keep the names and eventually add additional hex prefix checks if neceesary.
Great that your on this "benchmark trip", can you directly add this to Util? (so at least the EthereumJS ones, but maybe also with Noble depending on the dependency situation) Bit more written out description would also be nice. 🙂
It would be great if we generally establish a common location for new (vitest) benchmarks.
I would cautiously think we might even want to put this directly under test
? This is so strongly related (one might also combine functionality here), also this higher level benchmarks
folders feel so very "off" (maybe it's also because they are so "far away" from the other folders due to alphabetical sorting?) and one tends to forget about these.
And in the test
folder, this is a natural working directory anyhow, so bigger chances to stumble upon the benchmarks every now and then! 😄
So I would suggest: test/bench
Does that make sense? Open for other suggestions though!
Benchmarks are tricky once we merge the PR because we no longer have the internal version of the code once we switch to using Noble stuff internally.
Benchmarks are tricky once we merge the PR because we no longer have the internal version of the code once we switch to using Noble stuff internally.
Just take only the version we have I would suggest (even if a bit trivial)
@acolytec3 is this closed by https://github.com/ethereumjs/ethereumjs-monorepo/pull/3698 ?
Yes
While browsing single-file-evm.js, i've noticed there are at least a bunch of duplicate hexToBytes and concatBytes methods. While duplication is not a big deal, it could be avoided.
But the issue is about optimization. noble exposes hexToBytes in its code. I see ethereumjs implemented custom version using Regular Expressions, which are 2x slower by my benchmarks. On larger inputs, the difference is 3x.
Overall, noble utilities could be used in a following way:
Let me know if you want it to be exposed from
ethereum-cryptography
, whichutils
are already using.